Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Nov 18, 2022

What changes were proposed in this pull request?

Today, our SQL parser only supports PIVOT/UNPIVOT at the end of the FROM clause. This is quite limited and it's better to allow PIVOT/UNPIVOT in the join children as well. As a reference, snowflake supports it: https://docs.snowflake.com/en/sql-reference/constructs/from.html

This PR makes PIVOT/UNPIVOT the same level as JOIN. Wherever you can use JOIN to extend a relation, you can also use PIVOT/UNPIVOT. Many SQL syntaxes are supported after this PR

FROM t1 PIVOT/UNPIVOT ... JOIN t2  // pivot/unpivot the left table
FROM t1 JOIN t2 PIVOT/UNPIVOT ...  // pivot/unpivot the join result. This is the same before this PR
FROM t1 JOIN (t2 PIVOT/UNPIVOT ...)  // pivot/unpivot the right table
FROM t1 PIVOT/UNPIVOT ... PIVOT/UNPIVOT // nested pivot/unpivot

Why are the changes needed?

make PIVOT/UNPIVOT syntax more flexible

Does this PR introduce any user-facing change?

Yes, new SQL syntax without any breaking change

How was this patch tested?

new parser tests

@github-actions github-actions bot added the SQL label Nov 18, 2022
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add tests for pivot because:

  1. there is no pivot parser suite
  2. pivot/unpivot syntax is exactly the same regarding joins, no need to test both

@cloud-fan
Copy link
Contributor Author

cc @viirya @MaxGekk

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right.
As you say PIVOT and UNPIVOT are like JOINs, so why are do you have this strange ordered optional clauses instead of merging them in as if they were another type of join.

I can't put my finger on exactly how it breaks without running a bunch of tests.
For example I should be able to chain a string UNPIVOTs and PIVOTs in any order without the need for a JOIN (or a braces) in between. I don't see how that works here.

Comment on lines 704 to 705
Copy link
Contributor

@srielau srielau Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried simply:

Suggested change
: (joinType) JOIN LATERAL? right=relationPrimary joinCriteria? pivotClause? unpivotClause?
| NATURAL joinType JOIN LATERAL? right=relationPrimary pivotClause? unpivotClause?
: (joinType) JOIN LATERAL? right=relationPrimary joinCriteria?
| NATURAL joinType JOIN LATERAL? right=relationPrimary
| pivotClause
| unpivotClause

Also removing the relation entries above?

* Join one more [[LogicalPlan]] to the current logical plan.
*/
private def withJoinRelations(base: LogicalPlan, ctx: RelationContext): LogicalPlan = {
ctx.joinRelation.asScala.foldLeft(base) { (left, join) =>
Copy link
Contributor Author

@cloud-fan cloud-fan Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual code change is very small, just remove this loop and rename a few variables.

@github-actions github-actions bot added the DOCS label Nov 21, 2022
Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with the proposal

@gengliangwang
Copy link
Member

Thanks, merging to master

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
### What changes were proposed in this pull request?

Today, our SQL parser only supports PIVOT/UNPIVOT at the end of the FROM clause. This is quite limited and it's better to allow PIVOT/UNPIVOT in the join children as well. As a reference, snowflake supports it: https://docs.snowflake.com/en/sql-reference/constructs/from.html

This PR makes PIVOT/UNPIVOT the same level as JOIN. Wherever you can use JOIN to extend a relation, you can also use PIVOT/UNPIVOT. Many SQL syntaxes are supported after this PR
```
FROM t1 PIVOT/UNPIVOT ... JOIN t2  // pivot/unpivot the left table
FROM t1 JOIN t2 PIVOT/UNPIVOT ...  // pivot/unpivot the join result. This is the same before this PR
FROM t1 JOIN (t2 PIVOT/UNPIVOT ...)  // pivot/unpivot the right table
FROM t1 PIVOT/UNPIVOT ... PIVOT/UNPIVOT // nested pivot/unpivot
```

### Why are the changes needed?

make PIVOT/UNPIVOT syntax more flexible

### Does this PR introduce _any_ user-facing change?

Yes, new SQL syntax without any breaking change

### How was this patch tested?

new parser tests

Closes apache#38713 from cloud-fan/pivot.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
### What changes were proposed in this pull request?

Today, our SQL parser only supports PIVOT/UNPIVOT at the end of the FROM clause. This is quite limited and it's better to allow PIVOT/UNPIVOT in the join children as well. As a reference, snowflake supports it: https://docs.snowflake.com/en/sql-reference/constructs/from.html

This PR makes PIVOT/UNPIVOT the same level as JOIN. Wherever you can use JOIN to extend a relation, you can also use PIVOT/UNPIVOT. Many SQL syntaxes are supported after this PR
```
FROM t1 PIVOT/UNPIVOT ... JOIN t2  // pivot/unpivot the left table
FROM t1 JOIN t2 PIVOT/UNPIVOT ...  // pivot/unpivot the join result. This is the same before this PR
FROM t1 JOIN (t2 PIVOT/UNPIVOT ...)  // pivot/unpivot the right table
FROM t1 PIVOT/UNPIVOT ... PIVOT/UNPIVOT // nested pivot/unpivot
```

### Why are the changes needed?

make PIVOT/UNPIVOT syntax more flexible

### Does this PR introduce _any_ user-facing change?

Yes, new SQL syntax without any breaking change

### How was this patch tested?

new parser tests

Closes apache#38713 from cloud-fan/pivot.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
### What changes were proposed in this pull request?

Today, our SQL parser only supports PIVOT/UNPIVOT at the end of the FROM clause. This is quite limited and it's better to allow PIVOT/UNPIVOT in the join children as well. As a reference, snowflake supports it: https://docs.snowflake.com/en/sql-reference/constructs/from.html

This PR makes PIVOT/UNPIVOT the same level as JOIN. Wherever you can use JOIN to extend a relation, you can also use PIVOT/UNPIVOT. Many SQL syntaxes are supported after this PR
```
FROM t1 PIVOT/UNPIVOT ... JOIN t2  // pivot/unpivot the left table
FROM t1 JOIN t2 PIVOT/UNPIVOT ...  // pivot/unpivot the join result. This is the same before this PR
FROM t1 JOIN (t2 PIVOT/UNPIVOT ...)  // pivot/unpivot the right table
FROM t1 PIVOT/UNPIVOT ... PIVOT/UNPIVOT // nested pivot/unpivot
```

### Why are the changes needed?

make PIVOT/UNPIVOT syntax more flexible

### Does this PR introduce _any_ user-facing change?

Yes, new SQL syntax without any breaking change

### How was this patch tested?

new parser tests

Closes apache#38713 from cloud-fan/pivot.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants