-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17590][SQL] Analyze CTE definitions at once and allow CTE subquery to define CTE #15146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #65586 has finished for PR 15146 at commit
|
|
Test build #65588 has finished for PR 15146 at commit
|
|
|
||
| namedQuery | ||
| : name=identifier AS? '(' queryNoWith ')' | ||
| : name=identifier AS? '(' query ')' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain more about this grammar change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we only allow a query without With used in the CTE's named query. This change loose this limit to allow a query with With to be a named query. So we can define CTE in CTE subquery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, query is defined as "ctes? queryNoWith".
|
|
||
| test("define CTE in CTE subquery") { | ||
| checkAnswer( | ||
| sql("with t2 as (with t1 as (select 1 as b, 2 as c) select b, c from t1) " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use multi-line string with proper ident here? this test is almost unreadable...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Seems good, but not quite sure, cc @hvanhovell to confirm |
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
|
||
| test("define CTE in CTE subquery") { | ||
| checkAnswer( | ||
| sql("with t2 as (with t1 as (select 1 as b, 2 as c) select b, c from t1) " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
@viirya does this improve performance? |
|
@hvanhovell This is for analyzer change and adds CTE in CTE feature. I don't expect there is performance improvement. |
|
I guess if using the same analyzed plan increases the chance to reuse exchange, then it may improve the performance. Anyway, it is not the purpose of this change. Because the analyzed subquery plan will be changed largely in optimization later, we cannot guarantee this. |
|
Test build #65630 has finished for PR 15146 at commit
|
|
Merging to master. Thanks! |
|
@hvanhovell @cloud-fan Thanks! |
What changes were proposed in this pull request?
We substitute logical plan with CTE definitions in the analyzer rule CTESubstitution. A CTE definition can be used in the logical plan for multiple times, and its analyzed logical plan should be the same. We should not analyze CTE definitions multiple times when they are reused in the query.
By analyzing CTE definitions before substitution, we can support defining CTE in subquery.
How was this patch tested?
Jenkins tests.