Skip to content
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
% This is generated by ESQL's AbstractFunctionTestCase. Do not edit it. See ../README.md for how to regenerate it.

```esql
FROM employees
| WHERE languages < 3
| STATS `count` = COUNT(*) BY languages
```

| count:long | languages:integer |
| --- | --- |
| 15 | 1 |
| 19 | 2 |
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
% This is generated by ESQL's AbstractFunctionTestCase. Do not edit it. See ../README.md for how to regenerate it.

```esql
FROM employees
| STATS `count` = COUNT(*) WHERE languages < 3 BY languages
```

| count:long | languages:integer |
| --- | --- |
| 15 | 1 |
| 19 | 2 |
| 0 | 3 |
| 0 | 4 |
| 0 | 5 |
| 0 | null |
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ stack: ga
The `STATS` processing command groups rows according to a common value
and calculates one or more aggregated values over the grouped rows.

**Syntax**
## Syntax

```esql
STATS [column1 =] expression1 [WHERE boolean_expression1][,
Expand All @@ -15,7 +15,7 @@ STATS [column1 =] expression1 [WHERE boolean_expression1][,
[BY grouping_expression1[, ..., grouping_expressionN]]
```

**Parameters**
## Parameters

`columnX`
: The name by which the aggregated value is returned. If omitted, the name is
Expand All @@ -31,14 +31,24 @@ STATS [column1 =] expression1 [WHERE boolean_expression1][,
If its name coincides with one of the computed columns, that column will be ignored.

`boolean_expressionX`
: The condition that must be met for a row to be included in the evaluation of `expressionX`.
: The condition that must be met for a row to be included in the evaluation of
`expressionX`. Has no effect on `grouping_expressionX` or other aggregation
expressions. Consequently, the following are _not_ equivalent:

```esql
... | STATS ... WHERE <condition> ...
```

```esql
... | WHERE <condition> | STATS ...
```

::::{note}
Individual `null` values are skipped when computing aggregations.
::::


**Description**
## Description

The `STATS` processing command groups rows according to a common value
and calculates one or more aggregated values over the grouped rows. For the
Expand Down Expand Up @@ -75,42 +85,71 @@ and then grouping - that is not going to be faster.
::::


### Examples
## Examples

The following examples demonstrate common `STATS` patterns.

### Group by column

Calculating a statistic and grouping by the values of another column:
Combine an aggregation with `BY` to compute a value for each group:

:::{include} ../examples/stats.csv-spec/stats.md
:::

### Aggregate without grouping

Omitting `BY` returns one row with the aggregations applied over the entire
dataset:

:::{include} ../examples/stats.csv-spec/statsWithoutBy.md
:::

It’s possible to calculate multiple values:
### Calculate multiple values

Separate multiple aggregations with commas to compute them in a single pass:

:::{include} ../examples/stats.csv-spec/statsCalcMultipleValues.md
:::

To filter the rows that go into an aggregation, use the `WHERE` clause:
### Filter aggregations with WHERE

Use per-aggregation `WHERE` to compute conditional metrics from the same
dataset in a single pass:

:::{include} ../examples/stats.csv-spec/aggFiltering.md
:::

The aggregations can be mixed, with and without a filter and grouping is
optional as well:
### Mix filtered and unfiltered aggregations

Filtered and unfiltered aggregations can be freely mixed. Grouping is also
optional:

:::{include} ../examples/stats.csv-spec/aggFilteringNoGroup.md
:::

It’s also possible to group by multiple values:
### Filter on the grouping key

The `WHERE` clause can also filter on the grouping key. The group itself will
still appear in the output, but with a default value for the aggregation:

:::{include} ../examples/stats.csv-spec/aggFilteringOnGroup.md
:::

Compare this to filtering with `WHERE` before `STATS`, where rows are excluded
before grouping, so non-matching groups don't appear in the output at all:

:::{include} ../examples/stats.csv-spec/aggFilteringBefore.md
:::

### Group by multiple values

Separate multiple grouping expressions with a comma:

:::{include} ../examples/stats.csv-spec/statsGroupByMultipleValues.md
:::

$$$esql-stats-mv-group$$$
#### Multivalued inputs
### Multivalued inputs

If the grouping key is multivalued then the input row is in all groups:

Expand All @@ -137,24 +176,23 @@ key. If you want to send the group key to the function then `MV_EXPAND` first:
Refer to [elasticsearch/issues/134792](https://github.com/elastic/elasticsearch/issues/134792#issuecomment-3361168090)
for an even more in depth explanation.

#### Multivalue functions
### Multivalue functions

Both the aggregating functions and the grouping expressions accept other
functions. This is useful for using `STATS` on multivalue columns.
For example, to calculate the average salary change, you can use `MV_AVG` to
first average the multiple values per employee, and use the result with the
`AVG` function:
Both aggregation and grouping expressions accept other functions, which is
useful for using `STATS` on multivalue columns. For example, to calculate the
average salary change, use `MV_AVG` to first average the multiple values per
employee, then pass the result to `AVG`:

:::{include} ../examples/stats.csv-spec/docsStatsAvgNestedExpression.md
:::

An example of grouping by an expression is grouping employees on the first
letter of their last name:
Grouping expressions aren't limited to column references — any expression
works. For example, group by a derived value using `LEFT`:

:::{include} ../examples/stats.csv-spec/docsStatsByExpression.md
:::

#### Naming
### Output column naming

Specifying the output column name is optional. If not specified, the new column
name is equal to the expression. The following query returns a column named
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2504,6 +2504,43 @@ under_40K:long |inbetween:long |over_60K:long |total:long
// end::aggFilteringNoGroup-result[]
;

docsStatsWithFilteringOnGroup
required_capability: per_agg_filtering
// tag::aggFilteringOnGroup[]
FROM employees
| STATS `count` = COUNT(*) WHERE languages < 3 BY languages
// end::aggFilteringOnGroup[]
| SORT languages
;

// tag::aggFilteringOnGroup-result[]
count:long | languages:integer
15 | 1
19 | 2
0 | 3
0 | 4
0 | 5
0 | null
// end::aggFilteringOnGroup-result[]
;

docsStatsWithFilteringBefore
required_capability: per_agg_filtering
// tag::aggFilteringBefore[]
FROM employees
| WHERE languages < 3
| STATS `count` = COUNT(*) BY languages
// end::aggFilteringBefore[]
| SORT languages
;

// tag::aggFilteringBefore-result[]
count:long | languages:integer
15 | 1
19 | 2
// end::aggFilteringBefore-result[]
;

statsWithFiltering
required_capability: per_agg_filtering
from employees
Expand Down
Loading