-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children #38713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add tests for pivot because:
- there is no pivot parser suite
- pivot/unpivot syntax is exactly the same regarding joins, no need to test both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right.
As you say PIVOT and UNPIVOT are like JOINs, so why are do you have this strange ordered optional clauses instead of merging them in as if they were another type of join.
I can't put my finger on exactly how it breaks without running a bunch of tests.
For example I should be able to chain a string UNPIVOTs and PIVOTs in any order without the need for a JOIN (or a braces) in between. I don't see how that works here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried simply:
| : (joinType) JOIN LATERAL? right=relationPrimary joinCriteria? pivotClause? unpivotClause? | |
| | NATURAL joinType JOIN LATERAL? right=relationPrimary pivotClause? unpivotClause? | |
| : (joinType) JOIN LATERAL? right=relationPrimary joinCriteria? | |
| | NATURAL joinType JOIN LATERAL? right=relationPrimary | |
| | pivotClause | |
| | unpivotClause |
Also removing the relation entries above?
| * Join one more [[LogicalPlan]] to the current logical plan. | ||
| */ | ||
| private def withJoinRelations(base: LogicalPlan, ctx: RelationContext): LogicalPlan = { | ||
| ctx.joinRelation.asScala.foldLeft(base) { (left, join) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual code change is very small, just remove this loop and rename a few variables.
gengliangwang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 with the proposal
|
Thanks, merging to master |
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
### What changes were proposed in this pull request? Today, our SQL parser only supports PIVOT/UNPIVOT at the end of the FROM clause. This is quite limited and it's better to allow PIVOT/UNPIVOT in the join children as well. As a reference, snowflake supports it: https://docs.snowflake.com/en/sql-reference/constructs/from.html This PR makes PIVOT/UNPIVOT the same level as JOIN. Wherever you can use JOIN to extend a relation, you can also use PIVOT/UNPIVOT. Many SQL syntaxes are supported after this PR ``` FROM t1 PIVOT/UNPIVOT ... JOIN t2 // pivot/unpivot the left table FROM t1 JOIN t2 PIVOT/UNPIVOT ... // pivot/unpivot the join result. This is the same before this PR FROM t1 JOIN (t2 PIVOT/UNPIVOT ...) // pivot/unpivot the right table FROM t1 PIVOT/UNPIVOT ... PIVOT/UNPIVOT // nested pivot/unpivot ``` ### Why are the changes needed? make PIVOT/UNPIVOT syntax more flexible ### Does this PR introduce _any_ user-facing change? Yes, new SQL syntax without any breaking change ### How was this patch tested? new parser tests Closes apache#38713 from cloud-fan/pivot. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>
### What changes were proposed in this pull request? Today, our SQL parser only supports PIVOT/UNPIVOT at the end of the FROM clause. This is quite limited and it's better to allow PIVOT/UNPIVOT in the join children as well. As a reference, snowflake supports it: https://docs.snowflake.com/en/sql-reference/constructs/from.html This PR makes PIVOT/UNPIVOT the same level as JOIN. Wherever you can use JOIN to extend a relation, you can also use PIVOT/UNPIVOT. Many SQL syntaxes are supported after this PR ``` FROM t1 PIVOT/UNPIVOT ... JOIN t2 // pivot/unpivot the left table FROM t1 JOIN t2 PIVOT/UNPIVOT ... // pivot/unpivot the join result. This is the same before this PR FROM t1 JOIN (t2 PIVOT/UNPIVOT ...) // pivot/unpivot the right table FROM t1 PIVOT/UNPIVOT ... PIVOT/UNPIVOT // nested pivot/unpivot ``` ### Why are the changes needed? make PIVOT/UNPIVOT syntax more flexible ### Does this PR introduce _any_ user-facing change? Yes, new SQL syntax without any breaking change ### How was this patch tested? new parser tests Closes apache#38713 from cloud-fan/pivot. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>
### What changes were proposed in this pull request? Today, our SQL parser only supports PIVOT/UNPIVOT at the end of the FROM clause. This is quite limited and it's better to allow PIVOT/UNPIVOT in the join children as well. As a reference, snowflake supports it: https://docs.snowflake.com/en/sql-reference/constructs/from.html This PR makes PIVOT/UNPIVOT the same level as JOIN. Wherever you can use JOIN to extend a relation, you can also use PIVOT/UNPIVOT. Many SQL syntaxes are supported after this PR ``` FROM t1 PIVOT/UNPIVOT ... JOIN t2 // pivot/unpivot the left table FROM t1 JOIN t2 PIVOT/UNPIVOT ... // pivot/unpivot the join result. This is the same before this PR FROM t1 JOIN (t2 PIVOT/UNPIVOT ...) // pivot/unpivot the right table FROM t1 PIVOT/UNPIVOT ... PIVOT/UNPIVOT // nested pivot/unpivot ``` ### Why are the changes needed? make PIVOT/UNPIVOT syntax more flexible ### Does this PR introduce _any_ user-facing change? Yes, new SQL syntax without any breaking change ### How was this patch tested? new parser tests Closes apache#38713 from cloud-fan/pivot. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>
What changes were proposed in this pull request?
Today, our SQL parser only supports PIVOT/UNPIVOT at the end of the FROM clause. This is quite limited and it's better to allow PIVOT/UNPIVOT in the join children as well. As a reference, snowflake supports it: https://docs.snowflake.com/en/sql-reference/constructs/from.html
This PR makes PIVOT/UNPIVOT the same level as JOIN. Wherever you can use JOIN to extend a relation, you can also use PIVOT/UNPIVOT. Many SQL syntaxes are supported after this PR
Why are the changes needed?
make PIVOT/UNPIVOT syntax more flexible
Does this PR introduce any user-facing change?
Yes, new SQL syntax without any breaking change
How was this patch tested?
new parser tests