Skip to content
4 changes: 2 additions & 2 deletions doc/source/user_guide/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
df

# List the size of the animals with the highest weight.
df.groupby("animal").apply(lambda subf: subf["size"][subf["weight"].idxmax()])
df.groupby("animal")[["size", "weight"]].apply(lambda subf: subf["size"][subf["weight"].idxmax()])

`Using get_group
<https://stackoverflow.com/questions/14734533/how-to-access-pandas-groupby-dataframe-by-key>`__
Expand All @@ -482,7 +482,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
return pd.Series(["L", avg_weight, True], index=["size", "weight", "adult"])


expected_df = gb.apply(GrowUp)
expected_df = gb[["size", "weight"]].apply(GrowUp)
expected_df

`Expanding apply
Expand Down
14 changes: 10 additions & 4 deletions doc/source/user_guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -430,6 +430,12 @@ This is mainly syntactic sugar for the alternative, which is much more verbose:
Additionally, this method avoids recomputing the internal grouping information
derived from the passed key.

You can also include the grouping columns if you want to operate on them.

.. ipython:: python

grouped[["A", "B"]].sum()

.. _groupby.iterating-label:

Iterating through groups
Expand Down Expand Up @@ -1067,7 +1073,7 @@ missing values with the ``ffill()`` method.
).set_index("date")
df_re

df_re.groupby("group").resample("1D").ffill()
df_re.groupby("group")[["val"]].resample("1D").ffill()

.. _groupby.filter:

Expand Down Expand Up @@ -1233,13 +1239,13 @@ the argument ``group_keys`` which defaults to ``True``. Compare

.. ipython:: python

df.groupby("A", group_keys=True).apply(lambda x: x)
df.groupby("A", group_keys=True)[["B", "C", "D"]].apply(lambda x: x)

with

.. ipython:: python

df.groupby("A", group_keys=False).apply(lambda x: x)
df.groupby("A", group_keys=False)[["B", "C", "D"]].apply(lambda x: x)


Numba Accelerated Routines
Expand Down Expand Up @@ -1722,7 +1728,7 @@ column index name will be used as the name of the inserted column:
result = {"b_sum": x["b"].sum(), "c_mean": x["c"].mean()}
return pd.Series(result, name="metrics")

result = df.groupby("a").apply(compute_metrics)
result = df.groupby("a")[["b", "c"]].apply(compute_metrics)

result

Expand Down
22 changes: 17 additions & 5 deletions doc/source/whatsnew/v0.14.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -328,13 +328,25 @@ More consistent behavior for some groupby methods:

- groupby ``head`` and ``tail`` now act more like ``filter`` rather than an aggregation:

.. ipython:: python
.. code-block:: ipython

df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
g = df.groupby('A')
g.head(1) # filters DataFrame
In [1]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])

In [2]: g = df.groupby('A')

In [3]: g.head(1) # filters DataFrame
Out[3]:
A B
0 1 2
2 5 6

In [4]: g.apply(lambda x: x.head(1)) # used to simply fall-through
Out[4]:
A B
A
1 0 1 2
5 2 5 6

g.apply(lambda x: x.head(1)) # used to simply fall-through

- groupby head and tail respect column selection:

Expand Down
93 changes: 87 additions & 6 deletions doc/source/whatsnew/v0.18.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,52 @@ Previously you would have to do this to get a rolling window mean per-group:
df = pd.DataFrame({"A": [1] * 20 + [2] * 12 + [3] * 8, "B": np.arange(40)})
df

.. ipython:: python
.. code-block:: ipython

df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
In [1]: df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
Out[1]:
A
1 0 NaN
1 NaN
2 NaN
3 1.5
4 2.5
5 3.5
6 4.5
7 5.5
8 6.5
9 7.5
10 8.5
11 9.5
12 10.5
13 11.5
14 12.5
15 13.5
16 14.5
17 15.5
18 16.5
19 17.5
2 20 NaN
21 NaN
22 NaN
23 21.5
24 22.5
25 23.5
26 24.5
27 25.5
28 26.5
29 27.5
30 28.5
31 29.5
3 32 NaN
33 NaN
34 NaN
35 33.5
36 34.5
37 35.5
38 36.5
39 37.5
Name: B, dtype: float64

Now you can do:

Expand All @@ -101,15 +144,53 @@ For ``.resample(..)`` type of operations, previously you would have to:

df

.. ipython:: python
.. code-block:: ipython

df.groupby("group").apply(lambda x: x.resample("1D").ffill())
In[1]: df.groupby("group").apply(lambda x: x.resample("1D").ffill())
Out[1]:
group val
group date
1 2016-01-03 1 5
2016-01-04 1 5
2016-01-05 1 5
2016-01-06 1 5
2016-01-07 1 5
2016-01-08 1 5
2016-01-09 1 5
2016-01-10 1 6
2 2016-01-17 2 7
2016-01-18 2 7
2016-01-19 2 7
2016-01-20 2 7
2016-01-21 2 7
2016-01-22 2 7
2016-01-23 2 7
2016-01-24 2 8

Now you can do:

.. ipython:: python
.. code-block:: ipython

df.groupby("group").resample("1D").ffill()
In[1]: df.groupby("group").resample("1D").ffill()
Out[1]:
group val
group date
1 2016-01-03 1 5
2016-01-04 1 5
2016-01-05 1 5
2016-01-06 1 5
2016-01-07 1 5
2016-01-08 1 5
2016-01-09 1 5
2016-01-10 1 6
2 2016-01-17 2 7
2016-01-18 2 7
2016-01-19 2 7
2016-01-20 2 7
2016-01-21 2 7
2016-01-22 2 7
2016-01-23 2 7
2016-01-24 2 8

.. _whatsnew_0181.enhancements.method_chain:

Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ Other API changes

Deprecations
~~~~~~~~~~~~
- Deprecated :meth:`.DataFrameGroupBy.apply` and :class:`.Resampler` methods operating on the grouping column(s); select the columns to operate on after groupby to either explicitly include or exclude the groupings and avoid the ``FutureWarning`` (:issue:`7155`)
- Deprecated silently dropping unrecognized timezones when parsing strings to datetimes (:issue:`18702`)
- Deprecated :meth:`DataFrame._data` and :meth:`Series._data`, use public APIs instead (:issue:`33333`)
- Deprecated :meth:`.Groupby.all` and :meth:`.GroupBy.any` with datetime64 or :class:`PeriodDtype` values, matching the :class:`Series` and :class:`DataFrame` deprecations (:issue:`34479`)
Expand Down
20 changes: 20 additions & 0 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1487,6 +1487,16 @@ def f(g):
with option_context("mode.chained_assignment", None):
try:
result = self._python_apply_general(f, self._selected_obj)
if (
not isinstance(self.obj, Series)
and self._selection is None
and self._selected_obj.shape != self._obj_with_exclusions.shape
):
warnings.warn(
message=_apply_groupings_depr.format(type(self).__name__),
category=FutureWarning,
stacklevel=find_stack_level(),
)
except TypeError:
# gh-20949
# try again, with .apply acting as a filtering
Expand Down Expand Up @@ -4313,3 +4323,13 @@ def _insert_quantile_level(idx: Index, qs: npt.NDArray[np.float64]) -> MultiInde
else:
mi = MultiIndex.from_product([idx, qs])
return mi


# GH#7155
_apply_groupings_depr = (
"{}.apply operated on the grouping columns. This behavior is deprecated, "
"and in a future version of pandas the grouping columns will be excluded "
"from the operation. Select the columns to operate on after groupby to"
"either explicitly include or exclude the groupings and silence "
"this warning."
)
36 changes: 32 additions & 4 deletions pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,10 @@
Substitution,
doc,
)
from pandas.util._exceptions import find_stack_level
from pandas.util._exceptions import (
find_stack_level,
rewrite_warning,
)

from pandas.core.dtypes.generic import (
ABCDataFrame,
Expand All @@ -52,6 +55,7 @@
from pandas.core.groupby.groupby import (
BaseGroupBy,
GroupBy,
_apply_groupings_depr,
_pipe_template,
get_groupby,
)
Expand Down Expand Up @@ -420,6 +424,9 @@ def _groupby_and_aggregate(self, how, *args, **kwargs):
obj, by=None, grouper=grouper, axis=self.axis, group_keys=self.group_keys
)

target_message = "DataFrameGroupBy.apply operated on the grouping columns"
new_message = _apply_groupings_depr.format(type(self).__name__)

try:
if callable(how):
# TODO: test_resample_apply_with_additional_args fails if we go
Expand All @@ -436,7 +443,12 @@ def _groupby_and_aggregate(self, how, *args, **kwargs):
# a DataFrame column, but aggregate_item_by_item operates column-wise
# on Series, raising AttributeError or KeyError
# (depending on whether the column lookup uses getattr/__getitem__)
result = grouped.apply(how, *args, **kwargs)
with rewrite_warning(
target_message=target_message,
target_category=FutureWarning,
new_message=new_message,
):
result = grouped.apply(how, *args, **kwargs)

except ValueError as err:
if "Must produce aggregated value" in str(err):
Expand All @@ -448,7 +460,12 @@ def _groupby_and_aggregate(self, how, *args, **kwargs):

# we have a non-reducing function
# try to evaluate
result = grouped.apply(how, *args, **kwargs)
with rewrite_warning(
target_message=target_message,
target_category=FutureWarning,
new_message=new_message,
):
result = grouped.apply(how, *args, **kwargs)

return self._wrap_result(result)

Expand Down Expand Up @@ -1344,7 +1361,18 @@ def func(x):

return x.apply(f, *args, **kwargs)

result = self._groupby.apply(func)
msg = (
"DataFrameGroupBy.resample operated on the grouping columns. "
"This behavior is deprecated, and in a future version of "
"pandas the grouping columns will be excluded from the operation. "
"Subset the data to exclude the groupings and silence this warning."
)
with rewrite_warning(
target_message="DataFrameGroupBy.apply operated on the grouping columns",
target_category=FutureWarning,
new_message=msg,
):
result = self._groupby.apply(func)
return self._wrap_result(result)

_upsample = _apply
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/reshape/pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -457,7 +457,7 @@ def _all_key():
return (margins_name,) + ("",) * (len(cols) - 1)

if len(rows) > 0:
margin = data[rows].groupby(rows, observed=observed).apply(aggfunc)
margin = data.groupby(rows, observed=observed)[rows].apply(aggfunc)
all_key = _all_key()
table[all_key] = margin
result = table
Expand All @@ -475,7 +475,7 @@ def _all_key():
margin_keys = table.columns

if len(cols):
row_margin = data[cols].groupby(cols, observed=observed).apply(aggfunc)
row_margin = data.groupby(cols, observed=observed)[cols].apply(aggfunc)
else:
row_margin = Series(np.nan, index=result.columns)

Expand Down
8 changes: 6 additions & 2 deletions pandas/tests/extension/base/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,9 +99,13 @@ def test_groupby_extension_transform(self, data_for_grouping):

def test_groupby_extension_apply(self, data_for_grouping, groupby_apply_op):
df = pd.DataFrame({"A": [1, 1, 2, 2, 3, 3, 1, 4], "B": data_for_grouping})
df.groupby("B", group_keys=False).apply(groupby_apply_op)
msg = "DataFrameGroupBy.apply operated on the grouping columns"
with tm.assert_produces_warning(FutureWarning, match=msg):
df.groupby("B", group_keys=False).apply(groupby_apply_op)
df.groupby("B", group_keys=False).A.apply(groupby_apply_op)
df.groupby("A", group_keys=False).apply(groupby_apply_op)
msg = "DataFrameGroupBy.apply operated on the grouping columns"
with tm.assert_produces_warning(FutureWarning, match=msg):
df.groupby("A", group_keys=False).apply(groupby_apply_op)
df.groupby("A", group_keys=False).B.apply(groupby_apply_op)

def test_groupby_apply_identity(self, data_for_grouping):
Expand Down
8 changes: 6 additions & 2 deletions pandas/tests/extension/test_boolean.py
Original file line number Diff line number Diff line change
Expand Up @@ -298,9 +298,13 @@ def test_groupby_extension_transform(self, data_for_grouping):

def test_groupby_extension_apply(self, data_for_grouping, groupby_apply_op):
df = pd.DataFrame({"A": [1, 1, 2, 2, 3, 3, 1], "B": data_for_grouping})
df.groupby("B", group_keys=False).apply(groupby_apply_op)
msg = "DataFrameGroupBy.apply operated on the grouping columns"
with tm.assert_produces_warning(FutureWarning, match=msg):
df.groupby("B", group_keys=False).apply(groupby_apply_op)
df.groupby("B", group_keys=False).A.apply(groupby_apply_op)
df.groupby("A", group_keys=False).apply(groupby_apply_op)
msg = "DataFrameGroupBy.apply operated on the grouping columns"
with tm.assert_produces_warning(FutureWarning, match=msg):
df.groupby("A", group_keys=False).apply(groupby_apply_op)
df.groupby("A", group_keys=False).B.apply(groupby_apply_op)

def test_groupby_apply_identity(self, data_for_grouping):
Expand Down
4 changes: 3 additions & 1 deletion pandas/tests/frame/test_stack_unstack.py
Original file line number Diff line number Diff line change
Expand Up @@ -1577,7 +1577,9 @@ def test_unstack_bug(self):
}
)

result = df.groupby(["state", "exp", "barcode", "v"]).apply(len)
msg = "DataFrameGroupBy.apply operated on the grouping columns"
with tm.assert_produces_warning(FutureWarning, match=msg):
result = df.groupby(["state", "exp", "barcode", "v"]).apply(len)

unstacked = result.unstack()
restacked = unstacked.stack()
Expand Down
Loading