-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13221] [SQL] Fixing GroupingSets when Aggregate Functions Containing GroupBy Columns #11100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #50855 has finished for PR 11100 at commit
|
|
Test build #50869 has finished for PR 11100 at commit
|
|
test this please |
|
Test build #50872 has finished for PR 11100 at commit
|
|
retest this please |
|
Test build #50892 has finished for PR 11100 at commit
|
|
retest this please |
|
cc @yhuai @marmbrus @rxin Could you check if this one is an appropriate fix? After checking the history, I found you are involved the discussion in the original fix of rollup and cube. : ) @liancheng This is blocking the JIRA https://issues.apache.org/jira/browse/SPARK-12720. Will submit a PR after this issue is addressed. Sorry for the delays. Thanks! |
| val aggExprs = g.aggregations.map(_.transform { | ||
| case u: UnresolvedAttribute if resolver(u.name, VirtualColumn.groupingIdName) => gid | ||
| }.asInstanceOf[NamedExpression]) | ||
| g.copy(aggregations = aggExprs, groupByExprs = g.groupByExprs :+ gid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one has a potential bug. Will fix it soon.
|
Test build #50907 has finished for PR 11100 at commit
|
|
Test build #50909 has finished for PR 11100 at commit
|
|
retest this please |
|
Test build #50910 has finished for PR 11100 at commit
|
|
retest this please |
|
Test build #50925 has finished for PR 11100 at commit
|
|
retest this please |
|
Test build #51080 has finished for PR 11100 at commit
|
|
@davies @hvanhovell @aray I just realized you just reviewed a related PR: #10677. Could you also review this one? I just checked the results are still wrong in the latest code after the merge of #10677. Thanks! |
|
Test build #51087 has finished for PR 11100 at commit
|
|
LGTM |
|
Test build #51100 has started for PR 11100 at commit |
|
jenkins, test this please |
|
Test build #51111 has finished for PR 11100 at commit
|
| else { | ||
| g | ||
| } | ||
| case x: GroupingSets => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add if g.expressions.forall(_.resolved) to make sure that all the expression are all resolved.
|
@gatorsmile Thanks for work on this. The secret column name For the other bug, that could be fixed by resolve all the expressions before GroupingSet (one line change). |
|
@gatorsmile checked that your two tests could pass with these two tiny changes. |
|
Yeah, my first fix is very similar to what you proposed above. Will remember what you said regarding BTW, just tried the code changes and it works well in my local environment. Updated the codes. Thanks! |
|
Test build #51169 has finished for PR 11100 at commit
|
|
@gatorsmile I think the first commit should be enough. Since 2.0 is the best chance to deprecate GROUPING__ID, we should do that BEFORE release 2.0. |
|
Sure, will revert the changes. Thank you! |
|
Test build #51197 has finished for PR 11100 at commit
|
|
Test build #51229 has finished for PR 11100 at commit
|
|
@davies Already deprecated Also tried to output a better error message when users manually specify |
| GroupingSets(bitmasks(r), groupByExprs, child, aggregateExpressions) | ||
| // Ensure all the expressions have been resolved. | ||
| case g: GroupingSets if g.expressions.exists(!_.resolved) => g | ||
| case x: GroupingSets => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's more clear if you move the if to next case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will do. Thanks!
|
@gatorsmile I think you can check |
|
@davies As you suggested, the latest commit has the following changes:
Thanks! |
|
retest this please |
|
Test build #51273 has finished for PR 11100 at commit
|
|
Whoops triggered build unnecessarily |
|
@gatorsmile do we still need |
|
@hvanhovell Thank you for your reviews! Although we deprecate I like this change. Users are not allowed to select/query this hidden/secret column now. |
|
Test build #51275 has finished for PR 11100 at commit
|
|
LGTM, we keep the |
Using GroupingSets will generate a wrong result when Aggregate Functions containing GroupBy columns.
This PR is to fix it. Since the code changes are very small. Maybe we also can merge it to 1.6
For example, the following query returns a wrong result:
Before the fix, the results are like
After the fix, the results become correct:
UPDATE: This PR also deprecated the external column: GROUPING__ID.