-
Notifications
You must be signed in to change notification settings - Fork 25.6k
SQL: Implement FIRST/LAST aggregate functions #37936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
1be7ba1
3dd561a
cea52e1
43092ad
cd4df06
f2a516d
167ecf6
83daf26
fea067f
04c0e14
2ca0724
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -113,6 +113,129 @@ Returns the total number of _distinct non-null_ values in input values. | |
| include-tagged::{sql-specs}/docs.csv-spec[aggCountDistinct] | ||
| -------------------------------------------------- | ||
|
|
||
| [[sql-functions-aggs-first]] | ||
| ===== `FIRST/FIRST_VALUE` | ||
|
|
||
| .Synopsis: | ||
| [source, sql] | ||
| -------------------------------------------------- | ||
|
||
| FIRST(field_name<1>, sort_by_field_name<2>) | ||
|
||
| -------------------------------------------------- | ||
|
|
||
| *Input*: | ||
|
|
||
| <1> a field name | ||
|
||
| <2> a field name; optional | ||
|
|
||
| *Output*: same type as the input | ||
|
|
||
| .Description: | ||
|
|
||
| When only one argument is provided it returns the first **non-NULL** value across input values in the field | ||
|
||
| `field_name`. It will return **NULL** only if all values in `field_name` are null. When a second argument | ||
|
||
| is provided then it returns the first **non-NULL** value across input values in the field `field_name` ordered | ||
| ascending by the **non-NULL** values of `sort_by_field_name`. E.g.: | ||
|
|
||
| [cols="<,<"] | ||
| |=== | ||
| s|a|b | ||
|
||
| | 100 | 1 | ||
| | 200 | 1 | ||
| | 1 | 2 | ||
| | 2 | 2 | ||
| | 10 | null | ||
| | 20 | null | ||
| | | ||
| |=== | ||
|
|
||
| [source, sql] | ||
| ------------------------- | ||
| SELECT FIRST(a, b) FROM t | ||
| ------------------------- | ||
|
|
||
| will result in: | ||
| [cols="<"] | ||
| |=== | ||
| s|FIRST(a, b) | ||
| | 100 | ||
| |=== | ||
|
|
||
|
|
||
| ["source","sql",subs="attributes,macros"] | ||
| ----------------------------------------------------------- | ||
| include-tagged::{sql-specs}/docs.csv-spec[firstWithOneArg] | ||
| ----------------------------------------------------------- | ||
|
|
||
| ["source","sql",subs="attributes,macros"] | ||
| ----------------------------------------------------------- | ||
| include-tagged::{sql-specs}/docs.csv-spec[firstWithTwoArgs] | ||
| ----------------------------------------------------------- | ||
|
|
||
| [NOTE] | ||
| `FIRST` cannot be used in to create a filter in a `HAVING` clause of a `GROUP BY` query. | ||
|
||
|
|
||
| [[sql-functions-aggs-last]] | ||
| ===== `LAST/LAST_VALUE` | ||
|
|
||
| .Synopsis: | ||
| [source, sql] | ||
| -------------------------------------------------- | ||
| LAST(field_name<1>, sort_by_field_name<2>) | ||
|
||
| -------------------------------------------------- | ||
|
|
||
| *Input*: | ||
|
|
||
| <1> a field name | ||
| <2> a field name; optional | ||
|
|
||
| *Output*: same type as the input | ||
|
|
||
| .Description: | ||
|
|
||
| It's the inverse of <<sql-functions-aggs-first>>. When only one argument is provided it returns the | ||
|
||
| last **non-NULL** value across input values in the field `field_name`. It will return **NULL** only if | ||
| all values in `field_name` are null. When a second argument is provided then it returns the last | ||
| **non-NULL** value across input values in the field `field_name` ordered descending by the **non-NULL** | ||
| values of `sort_by_field_name`. E.g.: | ||
|
|
||
| [cols="<,<"] | ||
| |=== | ||
| s|a|b | ||
| | 10 | 1 | ||
| | 20 | 1 | ||
| | 1 | 2 | ||
| | 2 | 2 | ||
| | 100 | null | ||
| | 200 | null | ||
| |=== | ||
|
|
||
| [source, sql] | ||
| ------------------------ | ||
| SELECT LAST(a, b) FROM t | ||
| ------------------------ | ||
|
|
||
| will result in: | ||
| [cols="<"] | ||
| |=== | ||
| s|LAST(a, b) | ||
| | 2 | ||
| |=== | ||
|
|
||
|
|
||
| ["source","sql",subs="attributes,macros"] | ||
| ----------------------------------------------------------- | ||
| include-tagged::{sql-specs}/docs.csv-spec[lastWithOneArg] | ||
| ----------------------------------------------------------- | ||
|
|
||
| ["source","sql",subs="attributes,macros"] | ||
| ----------------------------------------------------------- | ||
| include-tagged::{sql-specs}/docs.csv-spec[lastWithTwoArgs] | ||
| ----------------------------------------------------------- | ||
|
|
||
| [NOTE] | ||
| `LAST` cannot be used in to create a filter in a `HAVING` clause of a `GROUP BY` query. | ||
|
|
||
|
|
||
| [[sql-functions-aggs-max]] | ||
| ===== `MAX` | ||
|
|
||
|
|
@@ -161,6 +284,9 @@ Returns the minimum value across input values in the field `field_name`. | |
| include-tagged::{sql-specs}/docs.csv-spec[aggMin] | ||
| -------------------------------------------------- | ||
|
|
||
| [NOTE] | ||
| `MIN` on a field of type <<text, `text`>> or <<keyword, `keyword`>> is translated into <<sql-functions-aggs-first>>. | ||
|
||
|
|
||
| [[sql-functions-aggs-sum]] | ||
| ===== `SUM` | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -185,7 +185,11 @@ SHOW FUNCTIONS; | |
| name | type | ||
| -----------------+--------------- | ||
| AVG |AGGREGATE | ||
| COUNT |AGGREGATE | ||
| COUNT |AGGREGATE | ||
| FIRST |AGGREGATE | ||
| FIRST_VALUE |AGGREGATE | ||
| LAST |AGGREGATE | ||
| LAST_VALUE |AGGREGATE | ||
| MAX |AGGREGATE | ||
| MIN |AGGREGATE | ||
| SUM |AGGREGATE | ||
|
|
@@ -699,6 +703,8 @@ SELECT MIN(salary) AS min, MAX(salary) AS max FROM emp HAVING min > 25000; | |
| // end::groupByHavingImplicitNoMatch | ||
| //; | ||
|
|
||
|
|
||
|
|
||
| /////////////////////////////// | ||
| // | ||
| // Grouping | ||
|
|
@@ -998,6 +1004,55 @@ SELECT COUNT(DISTINCT hire_date) unique_hires, COUNT(hire_date) AS hires FROM em | |
| // end::aggCountDistinct | ||
| ; | ||
|
|
||
| firstWithOneArg | ||
| schema::FIRST(first_name):s | ||
| // tag::firstWithOneArg | ||
| SELECT FIRST(first_name) FROM emp; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the examples would be more meaningful along with a GROUP BY.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Generally we don't have group by for min/max/avg/kurtosis etc. examples
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have at least one query with GROUP BY for each agg - the idea is to underline that this is in fact an agg (which might not be obvious when using implicit grouping). |
||
|
|
||
| FIRST(first_name) | ||
| ----------------- | ||
| Alejandro | ||
|
|
||
| // end::firstWithOneArg | ||
| ; | ||
|
|
||
| firstWithTwoArgs | ||
| schema::FIRST(first_name, birth_date):s | ||
| // tag::firstWithTwoArgs | ||
| SELECT FIRST(first_name, birth_date) FROM emp; | ||
|
|
||
| FIRST(first_name, birth_date) | ||
| ----------------------------- | ||
| Remzi | ||
|
|
||
| // end::firstWithTwoArgs | ||
| ; | ||
|
|
||
| lastWithOneArg | ||
| schema::LAST(first_name):s | ||
| // tag::lastWithOneArg | ||
| SELECT LAST(first_name) FROM emp; | ||
|
|
||
| LAST(first_name) | ||
| --------------- | ||
| Zvonko | ||
|
|
||
| // end::lastWithOneArg | ||
| ; | ||
|
|
||
|
|
||
| lastWithTwoArgs | ||
| schema::LAST(first_name, birth_date):s | ||
| // tag::lastWithTwoArgs | ||
| SELECT LAST(first_name, birth_date) FROM emp; | ||
|
|
||
| LAST(first_name, birth_date) | ||
| --------------------------- | ||
| Hilari | ||
|
|
||
| // end::lastWithTwoArgs | ||
| ; | ||
|
|
||
| aggMax | ||
| // tag::aggMax | ||
| SELECT MAX(salary) AS max FROM emp; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,6 +19,7 @@ | |
| import org.elasticsearch.xpack.sql.expression.function.Functions; | ||
| import org.elasticsearch.xpack.sql.expression.function.Score; | ||
| import org.elasticsearch.xpack.sql.expression.function.aggregate.AggregateFunctionAttribute; | ||
| import org.elasticsearch.xpack.sql.expression.function.aggregate.TopHits; | ||
| import org.elasticsearch.xpack.sql.expression.function.grouping.GroupingFunctionAttribute; | ||
| import org.elasticsearch.xpack.sql.expression.function.scalar.ScalarFunction; | ||
| import org.elasticsearch.xpack.sql.expression.predicate.conditional.ConditionalFunction; | ||
|
|
@@ -366,16 +367,26 @@ private static boolean checkGroupByHaving(LogicalPlan p, Set<Failure> localFailu | |
| if (f.child() instanceof Aggregate) { | ||
| Aggregate a = (Aggregate) f.child(); | ||
|
|
||
| Map<Expression, Node<?>> missing = new LinkedHashMap<>(); | ||
| Set<Expression> missing = new LinkedHashSet<>(); | ||
| Set<Expression> unsupported = new LinkedHashSet<>(); | ||
| Expression condition = f.condition(); | ||
| // variation of checkGroupMatch customized for HAVING, which requires just aggregations | ||
| condition.collectFirstChildren(c -> checkGroupByHavingHasOnlyAggs(c, condition, missing, functions)); | ||
| condition.collectFirstChildren(c -> checkGroupByHavingHasOnlyAggs(c, missing, unsupported, functions)); | ||
|
|
||
| if (!missing.isEmpty()) { | ||
| String plural = missing.size() > 1 ? "s" : StringUtils.EMPTY; | ||
| localFailures.add( | ||
| fail(condition, "Cannot use HAVING filter on non-aggregate" + plural + " %s; use WHERE instead", | ||
| Expressions.names(missing.keySet()))); | ||
| Expressions.names(missing))); | ||
| groupingFailures.add(a); | ||
| return false; | ||
| } | ||
|
|
||
| if (!unsupported.isEmpty()) { | ||
| String plural = unsupported.size() > 1 ? "s" : StringUtils.EMPTY; | ||
| localFailures.add( | ||
| fail(condition, "HAVING filter is unsupported for function" + plural + " %s", | ||
| Expressions.names(unsupported))); | ||
| groupingFailures.add(a); | ||
| return false; | ||
| } | ||
|
|
@@ -385,8 +396,8 @@ private static boolean checkGroupByHaving(LogicalPlan p, Set<Failure> localFailu | |
| } | ||
|
|
||
|
|
||
| private static boolean checkGroupByHavingHasOnlyAggs(Expression e, Node<?> source, | ||
| Map<Expression, Node<?>> missing, Map<String, Function> functions) { | ||
| private static boolean checkGroupByHavingHasOnlyAggs(Expression e, Set<Expression> missing, | ||
| Set<Expression> unsupported, Map<String, Function> functions) { | ||
|
|
||
| // resolve FunctionAttribute to backing functions | ||
| if (e instanceof FunctionAttribute) { | ||
|
|
@@ -407,13 +418,17 @@ private static boolean checkGroupByHavingHasOnlyAggs(Expression e, Node<?> sourc | |
|
|
||
| // unwrap function to find the base | ||
| for (Expression arg : sf.arguments()) { | ||
| arg.collectFirstChildren(c -> checkGroupByHavingHasOnlyAggs(c, source, missing, functions)); | ||
| arg.collectFirstChildren(c -> checkGroupByHavingHasOnlyAggs(c, missing, unsupported, functions)); | ||
| } | ||
| return true; | ||
|
|
||
| } else if (e instanceof Score) { | ||
| // Score can't be used for having | ||
| missing.put(e, source); | ||
| // Score can't be used in having | ||
| unsupported.add(e); | ||
| return true; | ||
| } else if (e instanceof TopHits) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this will not catch
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right as the Verifier is checking before the optimisations. |
||
| // First and last cannot be used in having | ||
| unsupported.add(e); | ||
| return true; | ||
| } | ||
|
|
||
|
|
@@ -428,7 +443,7 @@ private static boolean checkGroupByHavingHasOnlyAggs(Expression e, Node<?> sourc | |
|
|
||
| // left without leaves which have to match; that's a failure since everything should be based on an agg | ||
| if (e instanceof Attribute) { | ||
| missing.put(e, source); | ||
| missing.add(e); | ||
| return true; | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally please add an example for
FIRST_VALUE.