Skip to content
4 changes: 2 additions & 2 deletions doc/source/user_guide/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
df

# List the size of the animals with the highest weight.
df.groupby("animal").apply(lambda subf: subf["size"][subf["weight"].idxmax()])
df.groupby("animal").apply(lambda subf: subf["size"][subf["weight"].idxmax()], include_groups=False)

`Using get_group
<https://stackoverflow.com/questions/14734533/how-to-access-pandas-groupby-dataframe-by-key>`__
Expand All @@ -482,7 +482,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
return pd.Series(["L", avg_weight, True], index=["size", "weight", "adult"])


expected_df = gb.apply(GrowUp)
expected_df = gb.apply(GrowUp, include_groups=False)
expected_df

`Expanding apply
Expand Down
14 changes: 10 additions & 4 deletions doc/source/user_guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,12 @@ This is mainly syntactic sugar for the alternative, which is much more verbose:
Additionally, this method avoids recomputing the internal grouping information
derived from the passed key.

You can also include the grouping columns if you want to operate on them.

.. ipython:: python

grouped[["A", "B"]].sum()

.. _groupby.iterating-label:

Iterating through groups
Expand Down Expand Up @@ -1053,7 +1059,7 @@ missing values with the ``ffill()`` method.
).set_index("date")
df_re

df_re.groupby("group").resample("1D").ffill()
df_re.groupby("group").resample("1D", include_groups=False).ffill()

.. _groupby.filter:

Expand Down Expand Up @@ -1219,13 +1225,13 @@ the argument ``group_keys`` which defaults to ``True``. Compare

.. ipython:: python

df.groupby("A", group_keys=True).apply(lambda x: x)
df.groupby("A", group_keys=True).apply(lambda x: x, include_groups=False)

with

.. ipython:: python

df.groupby("A", group_keys=False).apply(lambda x: x)
df.groupby("A", group_keys=False).apply(lambda x: x, include_groups=False)


Numba Accelerated Routines
Expand Down Expand Up @@ -1709,7 +1715,7 @@ column index name will be used as the name of the inserted column:
result = {"b_sum": x["b"].sum(), "c_mean": x["c"].mean()}
return pd.Series(result, name="metrics")

result = df.groupby("a").apply(compute_metrics)
result = df.groupby("a").apply(compute_metrics, include_groups=False)

result

Expand Down
21 changes: 16 additions & 5 deletions doc/source/whatsnew/v0.14.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -328,13 +328,24 @@ More consistent behavior for some groupby methods:

- groupby ``head`` and ``tail`` now act more like ``filter`` rather than an aggregation:

.. ipython:: python
.. code-block:: ipython

df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
g = df.groupby('A')
g.head(1) # filters DataFrame
In [1]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])

g.apply(lambda x: x.head(1)) # used to simply fall-through
In [2]: g = df.groupby('A')

In [3]: g.head(1) # filters DataFrame
Out[3]:
A B
0 1 2
2 5 6

In [4]: g.apply(lambda x: x.head(1)) # used to simply fall-through
Out[4]:
A B
A
1 0 1 2
5 2 5 6

- groupby head and tail respect column selection:

Expand Down
93 changes: 87 additions & 6 deletions doc/source/whatsnew/v0.18.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,52 @@ Previously you would have to do this to get a rolling window mean per-group:
df = pd.DataFrame({"A": [1] * 20 + [2] * 12 + [3] * 8, "B": np.arange(40)})
df
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this DataFrame to make the below output be of reasonable size: df = pd.DataFrame({"A": [1] * 10 + [2] * 6 + [3] * 4, "B": np.arange(20)})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the motivation here? In general, I don't think we should be rewriting old release notes unless there is an issue we're fixing.


.. ipython:: python
.. code-block:: ipython

df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
In [1]: df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
Out[1]:
A
1 0 NaN
1 NaN
2 NaN
3 1.5
4 2.5
5 3.5
6 4.5
7 5.5
8 6.5
9 7.5
10 8.5
11 9.5
12 10.5
13 11.5
14 12.5
15 13.5
16 14.5
17 15.5
18 16.5
19 17.5
2 20 NaN
21 NaN
22 NaN
23 21.5
24 22.5
25 23.5
26 24.5
27 25.5
28 26.5
29 27.5
30 28.5
31 29.5
3 32 NaN
33 NaN
34 NaN
35 33.5
36 34.5
37 35.5
38 36.5
39 37.5
Name: B, dtype: float64

Now you can do:

Expand All @@ -101,15 +144,53 @@ For ``.resample(..)`` type of operations, previously you would have to:

df

.. ipython:: python
.. code-block:: ipython

df.groupby("group").apply(lambda x: x.resample("1D").ffill())
In[1]: df.groupby("group").apply(lambda x: x.resample("1D").ffill())
Out[1]:
group val
group date
1 2016-01-03 1 5
2016-01-04 1 5
2016-01-05 1 5
2016-01-06 1 5
2016-01-07 1 5
2016-01-08 1 5
2016-01-09 1 5
2016-01-10 1 6
2 2016-01-17 2 7
2016-01-18 2 7
2016-01-19 2 7
2016-01-20 2 7
2016-01-21 2 7
2016-01-22 2 7
2016-01-23 2 7
2016-01-24 2 8

Now you can do:

.. ipython:: python
.. code-block:: ipython

df.groupby("group").resample("1D").ffill()
In[1]: df.groupby("group").resample("1D").ffill()
Out[1]:
group val
group date
1 2016-01-03 1 5
2016-01-04 1 5
2016-01-05 1 5
2016-01-06 1 5
2016-01-07 1 5
2016-01-08 1 5
2016-01-09 1 5
2016-01-10 1 6
2 2016-01-17 2 7
2016-01-18 2 7
2016-01-19 2 7
2016-01-20 2 7
2016-01-21 2 7
2016-01-22 2 7
2016-01-23 2 7
2016-01-24 2 8

.. _whatsnew_0181.enhancements.method_chain:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -146,12 +146,12 @@ Deprecations
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_pickle` except ``path``. (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_string` except ``buf``. (:issue:`54229`)
- Deprecated downcasting behavior in :meth:`Series.where`, :meth:`DataFrame.where`, :meth:`Series.mask`, :meth:`DataFrame.mask`, :meth:`Series.clip`, :meth:`DataFrame.clip`; in a future version these will not infer object-dtype columns to non-object dtype, or all-round floats to integer dtype. Call ``result.infer_objects(copy=False)`` on the result for object inference, or explicitly cast floats to ints. To opt in to the future version, use ``pd.set_option("future.downcasting", True)`` (:issue:`53656`)
- Deprecated including the groups in computations when using :meth:`DataFrameGroupBy.apply` and :meth:`DataFrameGroupBy.resample`; pass ``include_groups=False`` to exclude the groups (:issue:`7155`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the groupby rolling/expanding/ewm ops affected by this too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not believe so. The only tests that hit groupby.apply in pandas/tests/window are where they directly call groupby.apply to produce the expected result. I also grepped the code and could not find any calls to groupby.apply in pandas.core.window.

- Deprecated not passing a tuple to :class:`DataFrameGroupBy.get_group` or :class:`SeriesGroupBy.get_group` when grouping by a length-1 list-like (:issue:`25971`)
- Deprecated strings ``S``, ``U``, and ``N`` denoting units in :func:`to_timedelta` (:issue:`52536`)
- Deprecated strings ``T``, ``S``, ``L``, ``U``, and ``N`` denoting frequencies in :class:`Minute`, :class:`Second`, :class:`Milli`, :class:`Micro`, :class:`Nano` (:issue:`52536`)
- Deprecated strings ``T``, ``S``, ``L``, ``U``, and ``N`` denoting units in :class:`Timedelta` (:issue:`52536`)
- Deprecated the extension test classes ``BaseNoReduceTests``, ``BaseBooleanReduceTests``, and ``BaseNumericReduceTests``, use ``BaseReduceTests`` instead (:issue:`54663`)
-

.. ---------------------------------------------------------------------------
.. _whatsnew_220.performance:
Expand Down
26 changes: 13 additions & 13 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -8869,20 +8869,20 @@ def update(
>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
... 'Parrot', 'Parrot'],
... 'Max Speed': [380., 370., 24., 26.]})
>>> df.groupby("Animal", group_keys=True).apply(lambda x: x)
Animal Max Speed
>>> df.groupby("Animal", group_keys=True)[['Max Speed']].apply(lambda x: x)
Max Speed
Animal
Falcon 0 Falcon 380.0
1 Falcon 370.0
Parrot 2 Parrot 24.0
3 Parrot 26.0

>>> df.groupby("Animal", group_keys=False).apply(lambda x: x)
Animal Max Speed
0 Falcon 380.0
1 Falcon 370.0
2 Parrot 24.0
3 Parrot 26.0
Falcon 0 380.0
1 370.0
Parrot 2 24.0
3 26.0

>>> df.groupby("Animal", group_keys=False)[['Max Speed']].apply(lambda x: x)
Max Speed
0 380.0
1 370.0
2 24.0
3 26.0
"""
)
)
Expand Down
Loading