elastic · leemthompo · Jan 30, 2024 · Jan 25, 2024 · Jan 25, 2024 · Jan 26, 2024
@@ -10,7 +10,9 @@ AVG(expression)
 ----
 
 `expression`::
-Numeric expression. If `null`, the function returns `null`.
+Numeric expression.
+//If `null`, the function returns `null`.
+// TODO: Remove comment when https://github.com/elastic/elasticsearch/issues/104900 is fixed.
 
 *Description*
 
@@ -20,7 +22,7 @@ The average of a numeric expression.
 
 The result is always a `double` no matter the input type.
 
-*Example*
+*Examples*
 
 [source.merge.styled,esql]
 ----
@@ -30,3 +32,16 @@ include::{esql-specs}/stats.csv-spec[tag=avg]
 |===
 include::{esql-specs}/stats.csv-spec[tag=avg-result]
 |===
+
+The expression can use inline functions. For example, to calculate the average
+over a multivalued column, first use `MV_AVG` to average the multiple values per
+row, and use the result with the `AVG` function:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression-result]
+|===
@@ -6,13 +6,13 @@
 
 [source,esql]
 ----
-COUNT_DISTINCT(column[, precision_threshold])
+COUNT_DISTINCT(expression[, precision_threshold])
 ----
 
 *Parameters*
 
-`column`::
-Column for which to count the number of distinct values.
+`expression`::
+Expression that outputs the values on which to perform a distinct count.
 
 `precision_threshold`::
 Precision threshold. Refer to <<esql-agg-count-distinct-approximate>>. The
@@ -23,29 +23,6 @@ same effect as a threshold of 40000. The default value is 3000.
 
 Returns the approximate number of distinct values.
 
-[discrete]
-[[esql-agg-count-distinct-approximate]]
-==== Counts are approximate
-
-Computing exact counts requires loading values into a set and returning its
-size. This doesn't scale when working on high-cardinality sets and/or large
-values as the required memory usage and the need to communicate those
-per-shard sets between nodes would utilize too many resources of the cluster.
-
-This `COUNT_DISTINCT` function is based on the
-https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
-algorithm, which counts based on the hashes of the values with some interesting
-properties:
-
-include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]
-
-The `COUNT_DISTINCT` function takes an optional second parameter to configure
-the precision threshold. The precision_threshold options allows to trade memory
-for accuracy, and defines a unique count below which counts are expected to be
-close to accurate. Above this value, counts might become a bit more fuzzy. The
-maximum supported value is 40000, thresholds above this number will have the
-same effect as a threshold of 40000. The default value is `3000`.
-
 *Supported types*
 
 Can take any field type as input.
@@ -71,3 +48,38 @@ include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision
 |===
 include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision-result]
 |===
+
+The expression can use inline functions. This example splits a string into
+multiple values using the `SPLIT` function and counts the unique values:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/stats_count_distinct.csv-spec[tag=docsCountDistinctWithExpression]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/stats_count_distinct.csv-spec[tag=docsCountDistinctWithExpression-result]
+|===
+
+[discrete]
+[[esql-agg-count-distinct-approximate]]
+==== Counts are approximate
+
+Computing exact counts requires loading values into a set and returning its
+size. This doesn't scale when working on high-cardinality sets and/or large
+values as the required memory usage and the need to communicate those
+per-shard sets between nodes would utilize too many resources of the cluster.
+
+This `COUNT_DISTINCT` function is based on the
+https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
+algorithm, which counts based on the hashes of the values with some interesting
+properties:
+
+include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]
+
+The `COUNT_DISTINCT` function takes an optional second parameter to configure
+the precision threshold. The precision_threshold options allows to trade memory
+for accuracy, and defines a unique count below which counts are expected to be
+close to accurate. Above this value, counts might become a bit more fuzzy. The
+maximum supported value is 40000, thresholds above this number will have the
+same effect as a threshold of 40000. The default value is `3000`.
@@ -6,14 +6,14 @@
 
 [source,esql]
 ----
-COUNT([input])
+COUNT([expression])
 ----
 
 *Parameters*
 
-`input`::
-Column or literal for which to count the number of values. If omitted, returns a
-count all (the number of rows).
+`expression`::
+Expression that outputs values to be counted.
+If omitted, equivalent to `COUNT(*)` (the number of rows).
 
 *Description*
 
@@ -44,3 +44,15 @@ include::{esql-specs}/docs.csv-spec[tag=countAll]
 |===
 include::{esql-specs}/docs.csv-spec[tag=countAll-result]
 |===
+
+The expression can use inline functions. This example splits a string into
+multiple values using the `SPLIT` function and counts the values:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/stats.csv-spec[tag=docsCountWithExpression]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/stats.csv-spec[tag=docsCountWithExpression-result]
+|===
@@ -6,17 +6,17 @@
 
 [source,esql]
 ----
-MAX(column)
+MAX(expression)
 ----
 
 *Parameters*
 
-`column`::
-Column from which to return the maximum value.
+`expression`::
+Expression from which to return the maximum value.
 
 *Description*
 
-Returns the maximum value of a numeric column.
+Returns the maximum value of a numeric expression.
 
 *Example*
 
@@ -28,3 +28,16 @@ include::{esql-specs}/stats.csv-spec[tag=max]
 |===
 include::{esql-specs}/stats.csv-spec[tag=max-result]
 |===
+
+The expression can use inline functions. For example, to calculate the maximum
+over an average of a multivalued column, use `MV_AVG` to first average the
+multiple values per row, and use the result with the `MAX` function:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/stats.csv-spec[tag=docsStatsMaxNestedExpression]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/stats.csv-spec[tag=docsStatsMaxNestedExpression-result]
+|===
@@ -6,13 +6,13 @@
 
 [source,esql]
 ----
-MEDIAN_ABSOLUTE_DEVIATION(column)
+MEDIAN_ABSOLUTE_DEVIATION(expression)
 ----
 
 *Parameters*
 
-`column`::
-Column from which to return the median absolute deviation.
+`expression`::
+Expression from which to return the median absolute deviation.
 
 *Description*
 
@@ -44,3 +44,17 @@ include::{esql-specs}/stats_percentile.csv-spec[tag=median-absolute-deviation]
 |===
 include::{esql-specs}/stats_percentile.csv-spec[tag=median-absolute-deviation-result]
 |===
+
+The expression can use inline functions. For example, to calculate the the
+median absolute deviation of the maximum values of a multivalued column, first
+use `MV_MAX` to get the maximum value per row, and use the result with the
+`MEDIAN_ABSOLUTE_DEVIATION` function:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMADNestedExpression]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMADNestedExpression-result]
+|===
@@ -6,13 +6,13 @@
 
 [source,esql]
 ----
-MEDIAN(column)
+MEDIAN(expression)
 ----
 
 *Parameters*
 
-`column`::
-Column from which to return the median value.
+`expression`::
+Expression from which to return the median value.
 
 *Description*
 
@@ -37,3 +37,16 @@ include::{esql-specs}/stats_percentile.csv-spec[tag=median]
 |===
 include::{esql-specs}/stats_percentile.csv-spec[tag=median-result]
 |===
+
+The expression can use inline functions. For example, to calculate the median of
+the maximum values of a multivalued column, first use `MV_MAX` to get the
+maximum value per row, and use the result with the `MEDIAN` function:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMedianNestedExpression]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMedianNestedExpression-result]
+|===
@@ -6,17 +6,17 @@
 
 [source,esql]
 ----
-MIN(column)
+MIN(expression)
 ----
 
 *Parameters*
 
-`column`::
-Column from which to return the minimum value.
+`expression`::
+Expression from which to return the minimum value.
 
 *Description*
 
-Returns the minimum value of a numeric column.
+Returns the minimum value of a numeric expression.
 
 *Example*
 
@@ -28,3 +28,16 @@ include::{esql-specs}/stats.csv-spec[tag=min]
 |===
 include::{esql-specs}/stats.csv-spec[tag=min-result]
 |===
+
+The expression can use inline functions. For example, to calculate the minimum
+over an average of a multivalued column, use `MV_AVG` to first average the
+multiple values per row, and use the result with the `MIN` function:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/stats.csv-spec[tag=docsStatsMinNestedExpression]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/stats.csv-spec[tag=docsStatsMinNestedExpression-result]
+|===
@@ -6,13 +6,13 @@
 
 [source,esql]
 ----
-PERCENTILE(column, percentile)
+PERCENTILE(expression, percentile)
 ----
 
 *Parameters*
 
-`column`::
-Column to convert from multiple values to single value.
+`expression`::
+Expression from which to return a percentile.
 
 `percentile`::
 A constant numeric expression.
@@ -23,18 +23,6 @@ Returns the value at which a certain percentage of observed values occur. For
 example, the 95th percentile is the value which is greater than 95% of the
 observed values and the 50th percentile is the <<esql-agg-median>>.
 
-[discrete]
-[[esql-agg-percentile-approximate]]
-==== `PERCENTILE` is (usually) approximate
-
-include::../../aggregations/metrics/percentile-aggregation.asciidoc[tag=approximate]
-
-[WARNING]
-====
-`PERCENTILE` is also {wikipedia}/Nondeterministic_algorithm[non-deterministic].
-This means you can get slightly different results using the same data.
-====
-
 *Example*
 
 [source.merge.styled,esql]
@@ -45,3 +33,28 @@ include::{esql-specs}/stats_percentile.csv-spec[tag=percentile]
 |===
 include::{esql-specs}/stats_percentile.csv-spec[tag=percentile-result]
 |===
+
+The expression can use inline functions. For example, to calculate a percentile
+of the maximum values of a multivalued column, first use `MV_MAX` to get the
+maximum value per row, and use the result with the `PERCENTILE` function:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsPercentileNestedExpression]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsPercentileNestedExpression-result]
+|===
+
+[discrete]
+[[esql-agg-percentile-approximate]]
+==== `PERCENTILE` is (usually) approximate
+
+include::../../aggregations/metrics/percentile-aggregation.asciidoc[tag=approximate]
+
+[WARNING]
+====
+`PERCENTILE` is also {wikipedia}/Nondeterministic_algorithm[non-deterministic].
+This means you can get slightly different results using the same data.
+====