-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18148][SQL] Misleading Error Message for Aggregation Without Window/GroupBy #15672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
494ef09
improve error message when checkAnalysis fail on Aggregate operator.
jiangxb1987 350b7a3
better format.
jiangxb1987 8986207
add test cases.
jiangxb1987 033f43f
map aggregate expressions to sql
jiangxb1987 2542f7f
modify test cases.
jiangxb1987 43f8fa6
modify test cases.
jiangxb1987 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
41 changes: 29 additions & 12 deletions
41
sql/core/src/test/resources/sql-tests/inputs/group-by.sql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,17 +1,34 @@ | ||
| -- Temporary data. | ||
| create temporary view myview as values 128, 256 as v(int_col); | ||
| -- Test data. | ||
| CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES | ||
| (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, null) | ||
| AS testData(a, b); | ||
|
|
||
| -- group by should produce all input rows, | ||
| select int_col, count(*) from myview group by int_col; | ||
| -- Aggregate with empty GroupBy expressions. | ||
| SELECT a, COUNT(b) FROM testData; | ||
| SELECT COUNT(a), COUNT(b) FROM testData; | ||
|
|
||
| -- group by should produce a single row. | ||
| select 'foo', count(*) from myview group by 1; | ||
| -- Aggregate with non-empty GroupBy expressions. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these are already tested earlier ain't they? |
||
| SELECT a, COUNT(b) FROM testData GROUP BY a; | ||
| SELECT a, COUNT(b) FROM testData GROUP BY b; | ||
| SELECT COUNT(a), COUNT(b) FROM testData GROUP BY a; | ||
|
|
||
| -- group-by should not produce any rows (whole stage code generation). | ||
| select 'foo' from myview where int_col == 0 group by 1; | ||
| -- Aggregate grouped by literals. | ||
| SELECT 'foo', COUNT(a) FROM testData GROUP BY 1; | ||
|
|
||
| -- group-by should not produce any rows (hash aggregate). | ||
| select 'foo', approx_count_distinct(int_col) from myview where int_col == 0 group by 1; | ||
| -- Aggregate grouped by literals (whole stage code generation). | ||
| SELECT 'foo' FROM testData WHERE a = 0 GROUP BY 1; | ||
|
|
||
| -- group-by should not produce any rows (sort aggregate). | ||
| select 'foo', max(struct(int_col)) from myview where int_col == 0 group by 1; | ||
| -- Aggregate grouped by literals (hash aggregate). | ||
| SELECT 'foo', APPROX_COUNT_DISTINCT(a) FROM testData WHERE a = 0 GROUP BY 1; | ||
|
|
||
| -- Aggregate grouped by literals (sort aggregate). | ||
| SELECT 'foo', MAX(STRUCT(a)) FROM testData WHERE a = 0 GROUP BY 1; | ||
|
|
||
| -- Aggregate with complex GroupBy expressions. | ||
| SELECT a + b, COUNT(b) FROM testData GROUP BY a + b; | ||
| SELECT a + 2, COUNT(b) FROM testData GROUP BY a + 1; | ||
| SELECT a + 1 + 1, COUNT(b) FROM testData GROUP BY a + 1; | ||
|
|
||
| -- Aggregate with nulls. | ||
| SELECT SKEWNESS(a), KURTOSIS(a), MIN(a), MAX(a), AVG(a), VARIANCE(a), STDDEV(a), SUM(a), COUNT(a) | ||
| FROM testData; | ||
116 changes: 99 additions & 17 deletions
116
sql/core/src/test/resources/sql-tests/results/group-by.sql.out
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,51 +1,133 @@ | ||
| -- Automatically generated by SQLQueryTestSuite | ||
| -- Number of queries: 6 | ||
| -- Number of queries: 14 | ||
|
|
||
|
|
||
| -- !query 0 | ||
| create temporary view myview as values 128, 256 as v(int_col) | ||
| CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES | ||
| (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, null) | ||
| AS testData(a, b) | ||
| -- !query 0 schema | ||
| struct<> | ||
| -- !query 0 output | ||
|
|
||
|
|
||
|
|
||
| -- !query 1 | ||
| select int_col, count(*) from myview group by int_col | ||
| SELECT a, COUNT(b) FROM testData | ||
| -- !query 1 schema | ||
| struct<int_col:int,count(1):bigint> | ||
| struct<> | ||
| -- !query 1 output | ||
| 128 1 | ||
| 256 1 | ||
| org.apache.spark.sql.AnalysisException | ||
| grouping expressions sequence is empty, and 'testdata.`a`' is not an aggregate function. Wrap '(count(testdata.`b`) AS `count(b)`)' in windowing function(s) or wrap 'testdata.`a`' in first() (or first_value) if you don't care which value you get.; | ||
|
|
||
|
|
||
| -- !query 2 | ||
| select 'foo', count(*) from myview group by 1 | ||
| SELECT COUNT(a), COUNT(b) FROM testData | ||
| -- !query 2 schema | ||
| struct<foo:string,count(1):bigint> | ||
| struct<count(a):bigint,count(b):bigint> | ||
| -- !query 2 output | ||
| foo 2 | ||
| 7 7 | ||
|
|
||
|
|
||
| -- !query 3 | ||
| select 'foo' from myview where int_col == 0 group by 1 | ||
| SELECT a, COUNT(b) FROM testData GROUP BY a | ||
| -- !query 3 schema | ||
| struct<foo:string> | ||
| struct<a:int,count(b):bigint> | ||
| -- !query 3 output | ||
|
|
||
| 1 2 | ||
| 2 2 | ||
| 3 2 | ||
| NULL 1 | ||
|
|
||
|
|
||
| -- !query 4 | ||
| select 'foo', approx_count_distinct(int_col) from myview where int_col == 0 group by 1 | ||
| SELECT a, COUNT(b) FROM testData GROUP BY b | ||
| -- !query 4 schema | ||
| struct<foo:string,approx_count_distinct(int_col):bigint> | ||
| struct<> | ||
| -- !query 4 output | ||
|
|
||
| org.apache.spark.sql.AnalysisException | ||
| expression 'testdata.`a`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.; | ||
|
|
||
|
|
||
| -- !query 5 | ||
| select 'foo', max(struct(int_col)) from myview where int_col == 0 group by 1 | ||
| SELECT COUNT(a), COUNT(b) FROM testData GROUP BY a | ||
| -- !query 5 schema | ||
| struct<foo:string,max(struct(int_col)):struct<int_col:int>> | ||
| struct<count(a):bigint,count(b):bigint> | ||
| -- !query 5 output | ||
| 0 1 | ||
| 2 2 | ||
| 2 2 | ||
| 3 2 | ||
|
|
||
|
|
||
| -- !query 6 | ||
| SELECT 'foo', COUNT(a) FROM testData GROUP BY 1 | ||
| -- !query 6 schema | ||
| struct<foo:string,count(a):bigint> | ||
| -- !query 6 output | ||
| foo 7 | ||
|
|
||
|
|
||
| -- !query 7 | ||
| SELECT 'foo' FROM testData WHERE a = 0 GROUP BY 1 | ||
| -- !query 7 schema | ||
| struct<foo:string> | ||
| -- !query 7 output | ||
|
|
||
|
|
||
|
|
||
| -- !query 8 | ||
| SELECT 'foo', APPROX_COUNT_DISTINCT(a) FROM testData WHERE a = 0 GROUP BY 1 | ||
| -- !query 8 schema | ||
| struct<foo:string,approx_count_distinct(a):bigint> | ||
| -- !query 8 output | ||
|
|
||
|
|
||
|
|
||
| -- !query 9 | ||
| SELECT 'foo', MAX(STRUCT(a)) FROM testData WHERE a = 0 GROUP BY 1 | ||
| -- !query 9 schema | ||
| struct<foo:string,max(struct(a)):struct<a:int>> | ||
| -- !query 9 output | ||
|
|
||
|
|
||
|
|
||
| -- !query 10 | ||
| SELECT a + b, COUNT(b) FROM testData GROUP BY a + b | ||
| -- !query 10 schema | ||
| struct<(a + b):int,count(b):bigint> | ||
| -- !query 10 output | ||
| 2 1 | ||
| 3 2 | ||
| 4 2 | ||
| 5 1 | ||
| NULL 1 | ||
|
|
||
|
|
||
| -- !query 11 | ||
| SELECT a + 2, COUNT(b) FROM testData GROUP BY a + 1 | ||
| -- !query 11 schema | ||
| struct<> | ||
| -- !query 11 output | ||
| org.apache.spark.sql.AnalysisException | ||
| expression 'testdata.`a`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.; | ||
|
|
||
|
|
||
| -- !query 12 | ||
| SELECT a + 1 + 1, COUNT(b) FROM testData GROUP BY a + 1 | ||
| -- !query 12 schema | ||
| struct<((a + 1) + 1):int,count(b):bigint> | ||
| -- !query 12 output | ||
| 3 2 | ||
| 4 2 | ||
| 5 2 | ||
| NULL 1 | ||
|
|
||
|
|
||
| -- !query 13 | ||
| SELECT SKEWNESS(a), KURTOSIS(a), MIN(a), MAX(a), AVG(a), VARIANCE(a), STDDEV(a), SUM(a), COUNT(a) | ||
| FROM testData | ||
| -- !query 13 schema | ||
| struct<skewness(CAST(a AS DOUBLE)):double,kurtosis(CAST(a AS DOUBLE)):double,min(a):int,max(a):int,avg(a):double,var_samp(CAST(a AS DOUBLE)):double,stddev_samp(CAST(a AS DOUBLE)):double,sum(a):bigint,count(a):bigint> | ||
| -- !query 13 output | ||
| -0.2723801058145729 -1.5069204152249134 1 3 2.142857142857143 0.8095238095238094 0.8997354108424372 15 7 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can we just augment the initial dataset rather than introducing a new testData?
It'd be better to use one dataset.