Skip to content

Conversation

@jinxing64
Copy link

@jinxing64 jinxing64 commented Oct 25, 2017

What changes were proposed in this pull request?

Currently, sql below will fail:

SELECT cnt, k2, k3, grouping__id
FROM
(SELECT count as cnt, k2, k3, grouping__id
FROM t1
GROUP BY k2, k3
GROUPING SETS(k2, k3)) t2

The use case is common in our warehouse and supported by hive now.
Could we support it?

How was this patch tested?

Test added

@SparkQA
Copy link

SparkQA commented Oct 25, 2017

Test build #83039 has finished for PR 19573 at commit c0ecbee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@DonnyZone
Copy link
Contributor

Is it similar to the below issue?
#19178

@jinxing64
Copy link
Author

@DonnyZone
Thanks for taking a look.
I think not quite the same.
After #18270, all grouping__id are transformed to be GroupingID , which makes user cannot select grouping__id with subquery.
Also after that pr, grouping__id is not deprecated any longer. This pr removes spark_grouping_id and simplify the logic.

"""
|SELECT cnt, k2, k3, grouping__id
|FROM
| (SELECT count(*) as cnt, k2, k3, grouping__id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT cnt, k2, k3, alias_grouping__id
FROM
  (SELECT count(*) as cnt, k2, k3, grouping__id as alias_grouping__id
  FROM (SELECT key, key%2 as k2 , key%3 as k3 FROM src) t1
  GROUP BY k2, k3
  GROUPING SETS(k2, k3)) t2
ORDER BY alias_grouping__id, k2, k3

This is the workaround.

@jinxing64 jinxing64 closed this Oct 27, 2017
@jinxing64
Copy link
Author

@gatorsmile
thanks for reply.
It seems you preffer to give the alias explicitly. I will close this pr and go by your suggestion.
But in my warehouse, there are lots of ETLs which are selecting grouping__id from subquery. We cannot migrate seamlessly

@gatorsmile
Copy link
Member

@jinxing64 You can keep it open, but it might take more time to review the fix.

@jinxing64
Copy link
Author

Thanks a lot. I will leave it open(if it's ok). Actually my friend from a another company also suffers this issue. Maybe people can leave some ideas on this.
Thanks again for comment on this. It will be great if you could review the pr when you have time. I can keep working on it :)

@jinxing64 jinxing64 reopened this Oct 27, 2017
@SparkQA
Copy link

SparkQA commented Oct 27, 2017

Test build #83131 has finished for PR 19573 at commit c0ecbee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 8, 2017

Test build #83594 has finished for PR 19573 at commit a593442.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 1, 2018

Test build #86928 has finished for PR 19573 at commit a593442.

  • This patch fails PySpark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2019

Test build #107638 has finished for PR 19573 at commit a593442.

  • This patch fails due to an unknown error code, -9.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@jinxing64 jinxing64 closed this Jul 14, 2019
@jinxing64 jinxing64 deleted the SPARK-22350 branch July 14, 2019 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants