Skip to content
Merged
19 changes: 17 additions & 2 deletions docs/reference/esql/functions/avg.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ AVG(expression)
----

`expression`::
Numeric expression. If `null`, the function returns `null`.
Numeric expression.
//If `null`, the function returns `null`.
// TODO: Remove comment when https://github.com/elastic/elasticsearch/issues/104900 is fixed.

*Description*

Expand All @@ -20,7 +22,7 @@ The average of a numeric expression.

The result is always a `double` no matter the input type.

*Example*
*Examples*

[source.merge.styled,esql]
----
Expand All @@ -30,3 +32,16 @@ include::{esql-specs}/stats.csv-spec[tag=avg]
|===
include::{esql-specs}/stats.csv-spec[tag=avg-result]
|===

The expression can use inline functions. For example, to calculate the average
over a multivalued column, first use `MV_AVG` to average the multiple values per
row, and use the result with the `AVG` function:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression-result]
|===
64 changes: 38 additions & 26 deletions docs/reference/esql/functions/count-distinct.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@

[source,esql]
----
COUNT_DISTINCT(column[, precision_threshold])
COUNT_DISTINCT(expression[, precision_threshold])
----

*Parameters*

`column`::
Column for which to count the number of distinct values.
`expression`::
Expression that outputs the values on which to perform a distinct count.

`precision_threshold`::
Precision threshold. Refer to <<esql-agg-count-distinct-approximate>>. The
Expand All @@ -23,29 +23,6 @@ same effect as a threshold of 40000. The default value is 3000.

Returns the approximate number of distinct values.

[discrete]
[[esql-agg-count-distinct-approximate]]
==== Counts are approximate

Computing exact counts requires loading values into a set and returning its
size. This doesn't scale when working on high-cardinality sets and/or large
values as the required memory usage and the need to communicate those
per-shard sets between nodes would utilize too many resources of the cluster.

This `COUNT_DISTINCT` function is based on the
https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
algorithm, which counts based on the hashes of the values with some interesting
properties:

include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]

The `COUNT_DISTINCT` function takes an optional second parameter to configure
the precision threshold. The precision_threshold options allows to trade memory
for accuracy, and defines a unique count below which counts are expected to be
close to accurate. Above this value, counts might become a bit more fuzzy. The
maximum supported value is 40000, thresholds above this number will have the
same effect as a threshold of 40000. The default value is `3000`.

*Supported types*

Can take any field type as input.
Expand All @@ -71,3 +48,38 @@ include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision
|===
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision-result]
|===

The expression can use inline functions. This example splits a string into
multiple values using the `SPLIT` function and counts the unique values:

[source.merge.styled,esql]
----
include::{esql-specs}/stats_count_distinct.csv-spec[tag=docsCountDistinctWithExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats_count_distinct.csv-spec[tag=docsCountDistinctWithExpression-result]
|===

[discrete]
[[esql-agg-count-distinct-approximate]]
==== Counts are approximate

Computing exact counts requires loading values into a set and returning its
size. This doesn't scale when working on high-cardinality sets and/or large
values as the required memory usage and the need to communicate those
per-shard sets between nodes would utilize too many resources of the cluster.

This `COUNT_DISTINCT` function is based on the
https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
algorithm, which counts based on the hashes of the values with some interesting
properties:

include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]

The `COUNT_DISTINCT` function takes an optional second parameter to configure
the precision threshold. The precision_threshold options allows to trade memory
for accuracy, and defines a unique count below which counts are expected to be
close to accurate. Above this value, counts might become a bit more fuzzy. The
maximum supported value is 40000, thresholds above this number will have the
same effect as a threshold of 40000. The default value is `3000`.
20 changes: 16 additions & 4 deletions docs/reference/esql/functions/count.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@

[source,esql]
----
COUNT([input])
COUNT([expression])
----

*Parameters*

`input`::
Column or literal for which to count the number of values. If omitted, returns a
count all (the number of rows).
`expression`::
Expression that outputs values to be counted.
If omitted, equivalent to `COUNT(*)` (the number of rows).

*Description*

Expand Down Expand Up @@ -44,3 +44,15 @@ include::{esql-specs}/docs.csv-spec[tag=countAll]
|===
include::{esql-specs}/docs.csv-spec[tag=countAll-result]
|===

The expression can use inline functions. This example splits a string into
multiple values using the `SPLIT` function and counts the values:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=docsCountWithExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=docsCountWithExpression-result]
|===
21 changes: 17 additions & 4 deletions docs/reference/esql/functions/max.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,17 @@

[source,esql]
----
MAX(column)
MAX(expression)
----

*Parameters*

`column`::
Column from which to return the maximum value.
`expression`::
Expression from which to return the maximum value.

*Description*

Returns the maximum value of a numeric column.
Returns the maximum value of a numeric expression.

*Example*

Expand All @@ -28,3 +28,16 @@ include::{esql-specs}/stats.csv-spec[tag=max]
|===
include::{esql-specs}/stats.csv-spec[tag=max-result]
|===

The expression can use inline functions. For example, to calculate the maximum
over an average of a multivalued column, use `MV_AVG` to first average the
multiple values per row, and use the result with the `MAX` function:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=docsStatsMaxNestedExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=docsStatsMaxNestedExpression-result]
|===
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@

[source,esql]
----
MEDIAN_ABSOLUTE_DEVIATION(column)
MEDIAN_ABSOLUTE_DEVIATION(expression)
----

*Parameters*

`column`::
Column from which to return the median absolute deviation.
`expression`::
Expression from which to return the median absolute deviation.

*Description*

Expand Down Expand Up @@ -44,3 +44,17 @@ include::{esql-specs}/stats_percentile.csv-spec[tag=median-absolute-deviation]
|===
include::{esql-specs}/stats_percentile.csv-spec[tag=median-absolute-deviation-result]
|===

The expression can use inline functions. For example, to calculate the the
median absolute deviation of the maximum values of a multivalued column, first
use `MV_MAX` to get the maximum value per row, and use the result with the
`MEDIAN_ABSOLUTE_DEVIATION` function:

[source.merge.styled,esql]
----
include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMADNestedExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMADNestedExpression-result]
|===
19 changes: 16 additions & 3 deletions docs/reference/esql/functions/median.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@

[source,esql]
----
MEDIAN(column)
MEDIAN(expression)
----

*Parameters*

`column`::
Column from which to return the median value.
`expression`::
Expression from which to return the median value.

*Description*

Expand All @@ -37,3 +37,16 @@ include::{esql-specs}/stats_percentile.csv-spec[tag=median]
|===
include::{esql-specs}/stats_percentile.csv-spec[tag=median-result]
|===

The expression can use inline functions. For example, to calculate the median of
the maximum values of a multivalued column, first use `MV_MAX` to get the
maximum value per row, and use the result with the `MEDIAN` function:

[source.merge.styled,esql]
----
include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMedianNestedExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMedianNestedExpression-result]
|===
21 changes: 17 additions & 4 deletions docs/reference/esql/functions/min.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,17 @@

[source,esql]
----
MIN(column)
MIN(expression)
----

*Parameters*

`column`::
Column from which to return the minimum value.
`expression`::
Expression from which to return the minimum value.

*Description*

Returns the minimum value of a numeric column.
Returns the minimum value of a numeric expression.

*Example*

Expand All @@ -28,3 +28,16 @@ include::{esql-specs}/stats.csv-spec[tag=min]
|===
include::{esql-specs}/stats.csv-spec[tag=min-result]
|===

The expression can use inline functions. For example, to calculate the minimum
over an average of a multivalued column, use `MV_AVG` to first average the
multiple values per row, and use the result with the `MIN` function:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=docsStatsMinNestedExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=docsStatsMinNestedExpression-result]
|===
43 changes: 28 additions & 15 deletions docs/reference/esql/functions/percentile.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@

[source,esql]
----
PERCENTILE(column, percentile)
PERCENTILE(expression, percentile)
----

*Parameters*

`column`::
Column to convert from multiple values to single value.
`expression`::
Expression from which to return a percentile.

`percentile`::
A constant numeric expression.
Expand All @@ -23,18 +23,6 @@ Returns the value at which a certain percentage of observed values occur. For
example, the 95th percentile is the value which is greater than 95% of the
observed values and the 50th percentile is the <<esql-agg-median>>.

[discrete]
[[esql-agg-percentile-approximate]]
==== `PERCENTILE` is (usually) approximate

include::../../aggregations/metrics/percentile-aggregation.asciidoc[tag=approximate]

[WARNING]
====
`PERCENTILE` is also {wikipedia}/Nondeterministic_algorithm[non-deterministic].
This means you can get slightly different results using the same data.
====

*Example*

[source.merge.styled,esql]
Expand All @@ -45,3 +33,28 @@ include::{esql-specs}/stats_percentile.csv-spec[tag=percentile]
|===
include::{esql-specs}/stats_percentile.csv-spec[tag=percentile-result]
|===

The expression can use inline functions. For example, to calculate a percentile
of the maximum values of a multivalued column, first use `MV_MAX` to get the
maximum value per row, and use the result with the `PERCENTILE` function:

[source.merge.styled,esql]
----
include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsPercentileNestedExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsPercentileNestedExpression-result]
|===

[discrete]
[[esql-agg-percentile-approximate]]
==== `PERCENTILE` is (usually) approximate

include::../../aggregations/metrics/percentile-aggregation.asciidoc[tag=approximate]

[WARNING]
====
`PERCENTILE` is also {wikipedia}/Nondeterministic_algorithm[non-deterministic].
This means you can get slightly different results using the same data.
====
Loading