diff --git a/docs/reference/query-languages/esql/_snippets/commands/examples/stats.csv-spec/aggFilteringBefore.md b/docs/reference/query-languages/esql/_snippets/commands/examples/stats.csv-spec/aggFilteringBefore.md new file mode 100644 index 0000000000000..c5d854148d705 --- /dev/null +++ b/docs/reference/query-languages/esql/_snippets/commands/examples/stats.csv-spec/aggFilteringBefore.md @@ -0,0 +1,12 @@ +% This is generated by ESQL's AbstractFunctionTestCase. Do not edit it. See ../README.md for how to regenerate it. + +```esql +FROM employees +| WHERE languages < 3 +| STATS `count` = COUNT(*) BY languages +``` + +| count:long | languages:integer | +| --- | --- | +| 15 | 1 | +| 19 | 2 | diff --git a/docs/reference/query-languages/esql/_snippets/commands/examples/stats.csv-spec/aggFilteringOnGroup.md b/docs/reference/query-languages/esql/_snippets/commands/examples/stats.csv-spec/aggFilteringOnGroup.md new file mode 100644 index 0000000000000..757d284aecaec --- /dev/null +++ b/docs/reference/query-languages/esql/_snippets/commands/examples/stats.csv-spec/aggFilteringOnGroup.md @@ -0,0 +1,15 @@ +% This is generated by ESQL's AbstractFunctionTestCase. Do not edit it. See ../README.md for how to regenerate it. + +```esql +FROM employees +| STATS `count` = COUNT(*) WHERE languages < 3 BY languages +``` + +| count:long | languages:integer | +| --- | --- | +| 15 | 1 | +| 19 | 2 | +| 0 | 3 | +| 0 | 4 | +| 0 | 5 | +| 0 | null | diff --git a/docs/reference/query-languages/esql/_snippets/commands/layout/stats-by.md b/docs/reference/query-languages/esql/_snippets/commands/layout/stats-by.md index f60ea4b1a6bf9..f4a8835828dc3 100644 --- a/docs/reference/query-languages/esql/_snippets/commands/layout/stats-by.md +++ b/docs/reference/query-languages/esql/_snippets/commands/layout/stats-by.md @@ -6,7 +6,7 @@ stack: ga The `STATS` processing command groups rows according to a common value and calculates one or more aggregated values over the grouped rows. -**Syntax** +## Syntax ```esql STATS [column1 =] expression1 [WHERE boolean_expression1][, @@ -15,7 +15,7 @@ STATS [column1 =] expression1 [WHERE boolean_expression1][, [BY grouping_expression1[, ..., grouping_expressionN]] ``` -**Parameters** +## Parameters `columnX` : The name by which the aggregated value is returned. If omitted, the name is @@ -31,14 +31,24 @@ STATS [column1 =] expression1 [WHERE boolean_expression1][, If its name coincides with one of the computed columns, that column will be ignored. `boolean_expressionX` -: The condition that must be met for a row to be included in the evaluation of `expressionX`. +: The condition that must be met for a row to be included in the evaluation of + `expressionX`. Has no effect on `grouping_expressionX` or other aggregation + expressions. Consequently, the following are _not_ equivalent: + + ```esql + ... | STATS ... WHERE ... + ``` + + ```esql + ... | WHERE | STATS ... + ``` ::::{note} Individual `null` values are skipped when computing aggregations. :::: -**Description** +## Description The `STATS` processing command groups rows according to a common value and calculates one or more aggregated values over the grouped rows. For the @@ -75,42 +85,71 @@ and then grouping - that is not going to be faster. :::: -### Examples +## Examples + +The following examples demonstrate common `STATS` patterns. + +### Group by column -Calculating a statistic and grouping by the values of another column: +Combine an aggregation with `BY` to compute a value for each group: :::{include} ../examples/stats.csv-spec/stats.md ::: +### Aggregate without grouping + Omitting `BY` returns one row with the aggregations applied over the entire dataset: :::{include} ../examples/stats.csv-spec/statsWithoutBy.md ::: -It’s possible to calculate multiple values: +### Calculate multiple values + +Separate multiple aggregations with commas to compute them in a single pass: :::{include} ../examples/stats.csv-spec/statsCalcMultipleValues.md ::: -To filter the rows that go into an aggregation, use the `WHERE` clause: +### Filter aggregations with WHERE + +Use per-aggregation `WHERE` to compute conditional metrics from the same +dataset in a single pass: :::{include} ../examples/stats.csv-spec/aggFiltering.md ::: -The aggregations can be mixed, with and without a filter and grouping is -optional as well: +### Mix filtered and unfiltered aggregations + +Filtered and unfiltered aggregations can be freely mixed. Grouping is also +optional: :::{include} ../examples/stats.csv-spec/aggFilteringNoGroup.md ::: -It’s also possible to group by multiple values: +### Filter on the grouping key + +The `WHERE` clause can also filter on the grouping key. The group itself will +still appear in the output, but with a default value for the aggregation: + +:::{include} ../examples/stats.csv-spec/aggFilteringOnGroup.md +::: + +Compare this to filtering with `WHERE` before `STATS`, where rows are excluded +before grouping, so non-matching groups don't appear in the output at all: + +:::{include} ../examples/stats.csv-spec/aggFilteringBefore.md +::: + +### Group by multiple values + +Separate multiple grouping expressions with a comma: :::{include} ../examples/stats.csv-spec/statsGroupByMultipleValues.md ::: $$$esql-stats-mv-group$$$ -#### Multivalued inputs +### Multivalued inputs If the grouping key is multivalued then the input row is in all groups: @@ -137,24 +176,23 @@ key. If you want to send the group key to the function then `MV_EXPAND` first: Refer to [elasticsearch/issues/134792](https://github.com/elastic/elasticsearch/issues/134792#issuecomment-3361168090) for an even more in depth explanation. -#### Multivalue functions +### Multivalue functions -Both the aggregating functions and the grouping expressions accept other -functions. This is useful for using `STATS` on multivalue columns. -For example, to calculate the average salary change, you can use `MV_AVG` to -first average the multiple values per employee, and use the result with the -`AVG` function: +Both aggregation and grouping expressions accept other functions, which is +useful for using `STATS` on multivalue columns. For example, to calculate the +average salary change, use `MV_AVG` to first average the multiple values per +employee, then pass the result to `AVG`: :::{include} ../examples/stats.csv-spec/docsStatsAvgNestedExpression.md ::: -An example of grouping by an expression is grouping employees on the first -letter of their last name: +Grouping expressions aren't limited to column references — any expression +works. For example, group by a derived value using `LEFT`: :::{include} ../examples/stats.csv-spec/docsStatsByExpression.md ::: -#### Naming +### Output column naming Specifying the output column name is optional. If not specified, the new column name is equal to the expression. The following query returns a column named diff --git a/x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec b/x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec index 097d7e82acfef..1a5b57f18306a 100644 --- a/x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec +++ b/x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec @@ -2504,6 +2504,43 @@ under_40K:long |inbetween:long |over_60K:long |total:long // end::aggFilteringNoGroup-result[] ; +docsStatsWithFilteringOnGroup +required_capability: per_agg_filtering +// tag::aggFilteringOnGroup[] +FROM employees +| STATS `count` = COUNT(*) WHERE languages < 3 BY languages +// end::aggFilteringOnGroup[] +| SORT languages +; + +// tag::aggFilteringOnGroup-result[] +count:long | languages:integer +15 | 1 +19 | 2 +0 | 3 +0 | 4 +0 | 5 +0 | null +// end::aggFilteringOnGroup-result[] +; + +docsStatsWithFilteringBefore +required_capability: per_agg_filtering +// tag::aggFilteringBefore[] +FROM employees +| WHERE languages < 3 +| STATS `count` = COUNT(*) BY languages +// end::aggFilteringBefore[] +| SORT languages +; + +// tag::aggFilteringBefore-result[] +count:long | languages:integer +15 | 1 +19 | 2 +// end::aggFilteringBefore-result[] +; + statsWithFiltering required_capability: per_agg_filtering from employees