Skip to content

Commit

Permalink
improve agg docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Colin Ho authored and Colin Ho committed Sep 6, 2024
1 parent 6e9bd1a commit 6e07974
Show file tree
Hide file tree
Showing 2 changed files with 73 additions and 18 deletions.
4 changes: 4 additions & 0 deletions daft/dataframe/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -2179,6 +2179,8 @@ def agg(self, *to_agg: Union[Expression, Iterable[Expression]]) -> "DataFrame":
"""Perform aggregations on this DataFrame. Allows for mixed aggregations for multiple columns
Will return a single row that aggregated the entire DataFrame.
For a full list of aggregation expressions, see :ref:`Aggregation Expressions <api=aggregation-expression>`
Example:
>>> import daft
>>> from daft import col
Expand Down Expand Up @@ -2834,6 +2836,8 @@ def agg_concat(self, *cols: ColumnInputType) -> "DataFrame":
def agg(self, *to_agg: Union[Expression, Iterable[Expression]]) -> "DataFrame":
"""Perform aggregations on this GroupedDataFrame. Allows for mixed aggregations.
For a full list of aggregation expressions, see :ref:`Aggregation Expressions <api=aggregation-expression>`
Example:
>>> import daft
>>> from daft import col
Expand Down
87 changes: 69 additions & 18 deletions docs/source/user_guide/daft_in_depth/aggregations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ Aggregations and Grouping

Some operations such as the sum or the average of a column are called **aggregations**. Aggregations are operations that reduce the number of rows in a column.

For a full list of available aggregations, see: :ref:`df-aggregations`.

Global Aggregations
-------------------

Expand All @@ -23,13 +21,41 @@ An aggregation can be applied on an entire DataFrame, for example to get the mea
.. code:: none
+-----------+
| score |
| Float64 |
+===========+
| 25 |
+-----------+
(Showing first 1 rows)
╭─────────╮
│ score │
│ --- │
│ Float64 │
╞═════════╡
│ 25 │
╰─────────╯
(Showing first 1 of 1 rows)
For a full list of available Dataframe aggregations, see: :ref:`df-aggregations`.

Aggregations can also be mixed and matched across columns, via the `agg` method:

.. code:: python
df.agg(
df["score"].mean().alias("mean_score"),
df["score"].max().alias("max_score"),
df["class"].count().alias("class_count"),
).show()
.. code:: none
╭────────────┬───────────┬─────────────╮
│ mean_score ┆ max_score ┆ class_count │
│ --- ┆ --- ┆ --- │
│ Float64 ┆ Float64 ┆ UInt64 │
╞════════════╪═══════════╪═════════════╡
│ 25 ┆ 40 ┆ 4 │
╰────────────┴───────────┴─────────────╯
(Showing first 1 of 1 rows)
For a full list of available aggregation expressions, see: :ref:`Aggregation Expressions <api=aggregation-expression>`

Grouped Aggregations
--------------------
Expand All @@ -44,12 +70,37 @@ Let's run the mean of column "score" again, but this time grouped by "class":
.. code:: none
+---------+-----------+
| class | score |
| Utf8 | Float64 |
+=========+===========+
| b | 35 |
+---------+-----------+
| a | 15 |
+---------+-----------+
(Showing first 2 rows)
╭───────┬─────────╮
│ class ┆ score │
│ --- ┆ --- │
│ Utf8 ┆ Float64 │
╞═══════╪═════════╡
│ a ┆ 15 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ b ┆ 35 │
╰───────┴─────────╯
(Showing first 2 of 2 rows)
To run multiple aggregations on a Grouped DataFrame, you can use the `agg` method:

.. code:: python
df.groupby("class").agg(
df["score"].mean().alias("mean_score"),
df["score"].max().alias("max_score"),
).show()
.. code:: none
╭───────┬────────────┬───────────╮
│ class ┆ mean_score ┆ max_score │
│ --- ┆ --- ┆ --- │
│ Utf8 ┆ Float64 ┆ Float64 │
╞═══════╪════════════╪═══════════╡
│ a ┆ 15 ┆ 20 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ b ┆ 35 ┆ 40 │
╰───────┴────────────┴───────────╯
(Showing first 2 of 2 rows)

0 comments on commit 6e07974

Please sign in to comment.