-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12797] [SQL] Generated TungstenAggregate (without grouping keys) #10840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #49720 has finished for PR 10840 at commit
|
|
What's the difference between this one and #10786? |
|
#10768 had more unrelated changes, is used for prototype, this is the one ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a case with multiple agg exprs?
|
Looking good, just some minor comments. |
|
Test build #49799 has finished for PR 10840 at commit
|
|
Test build #49808 has finished for PR 10840 at commit
|
|
Test build #2422 has finished for PR 10840 at commit
|
|
Test build #49809 has finished for PR 10840 at commit
|
|
The last commit had passed the tests, I'm going to merge this into master. |
|
@davies it would be great if we can separate the generated code into two functions -- one that does the aggregation, and the other that does the output. This way, we can separate this into two "pipelines". cc@nongli |
|
@rxin We can do that when this is a grouping key. For this case, it only output single row, usually it will be in the last few operators. |
|
Why not just do it for both cases so it is more unified? I think the point is that we'd want the generated code to reflect more accurately the number of pipelines that are actually used. |
| expectedMetrics: Map[Long, (String, Map[String, Any])]): Unit = { | ||
| val previousExecutionIds = sqlContext.listener.executionIdToData.keySet | ||
| df.collect() | ||
| withSQLConf("spark.sql.codegen.wholeStage" -> "false") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davies, why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR, the SQLMetrics are not supported in whole stage codegen.
As discussed in #10786, the generated TungstenAggregate does not support imperative functions.
For a query
The generated code will looks like:
cc @nongli