-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add support for GROUP BY AUTO aggregation
#18390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fc01df4 to
e809697
Compare
|
This is great. I fully support adding this syntax. |
48bb65e to
36bff45
Compare
martint
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conflicts with the standard syntax for GROUP BY:
GROUP BY [ <set quantifier> ] <grouping element list>
where <set quantifier> can be ALL or DISTINCT and defaults to ALL if omitted. The quantifier affects the semantics for queries involving grouping sets. Overloading the meaning to indicate how the keys are selected instead of how the rows in the result are de-duplicated is confusing and error prone.
Before we could consider such syntax, we'd need to define the precise semantics and how it interacts and relates to the broader GROUP BY feature.
|
In particular, here are some inconsistencies: produces a result, but fails with fails with Without precise semantics, it's hard to tell what's the behavior of these queries after this change: |
|
36bff45 to
9153730
Compare
9153730 to
9882169
Compare
|
To be clear, the ALL and DISTINCT qualifiers control whether grouping sets that have the same combination of keys are deduped. Therefore, the only way this syntax makes sense is if we give meaning to omitting the grouping set specification. Specifically, to be equivalent to having a single grouping set composed of all the expressions in the group by clause that don’t contain aggregations. In that case, the qualifier is orthogonal to such feature. Allowing one but not the other looks arbitrary and introduces cognitive load for a user who has to understand that they are somehow connected even though intuitively they should not be. Another aspect that complicates issues conceptually is that the GROUP BY operation occurs before the SELECT clause is computed, so it’s a chicken-and-egg problem to determine which columns are grouping keys and which ones are derived. Also, the GROUP BY clause operates on input columns (those coming from the FROM clause) not on those from the SELECT clause. The implication arrow goes the other way: an expression in the SELECT clause is valid if it’s functionally dependent on the input columns used for computing the grouping sets. |
9882169 to
2d40f4a
Compare
|
@martint Thank you for your detailed explanation. Can you suggest alternative syntax? Or we don't want to add this feature? |
2d40f4a to
7dc90bb
Compare
7dc90bb to
e2b781e
Compare
8890f18 to
0b7ccc1
Compare
GROUP BY ALL aggregationGROUP BY * aggregation
0b7ccc1 to
98f0ff2
Compare
db27ca8 to
b128dae
Compare
|
Updated the syntax to |
|
I just had another idea -- instead of |
b128dae to
adc8993
Compare
GROUP BY IMPLICIT aggregationGROUP BY AUTO aggregation
|
I prefer |
core/trino-grammar/src/main/antlr4/io/trino/grammar/sql/SqlBase.g4
Outdated
Show resolved
Hide resolved
8d9a394 to
adf8bd3
Compare
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-parser/src/main/java/io/trino/sql/tree/GroupingSets.java
Outdated
Show resolved
Hide resolved
adf8bd3 to
681b500
Compare
|
@martint Gentle reminder. |
core/trino-parser/src/test/java/io/trino/sql/parser/TestSqlParser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/sql/query/TestGroupBy.java
Outdated
Show resolved
Hide resolved
|
Addressed comments. |
core/trino-main/src/test/java/io/trino/sql/query/TestGroupBy.java
Outdated
Show resolved
Hide resolved
015ac49 to
634a000
Compare
martint
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to squash the fixup commits.
634a000 to
826f3ac
Compare
Description
This syntax allows omitting column positions or names after
GROUP BY.For instance,
SELECT name, count(1) FROM test GROUP BY AUTOwill be translated toSELECT name, count(1) FROM test GROUP BY nameReferences in other database/query engines
Release notes
(x) Release notes are required, with the following suggested text: