Skip to content

Conversation

@allisonwang-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR updates the splitSubquery in RewriteCorrelatedScalarSubquery to support non-aggregated one-row subquery.

In CheckAnalysis, we allow three types of correlated scalar subquery patterns:

  1. SubqueryAlias/Project + Aggregate
  2. SubqueryAlias/Project + Filter + Aggregate
  3. SubqueryAlias/Project + LogicalPlan (maxRows <= 1)

cleanQueryInScalarSubquery(query) match {
case a: Aggregate => checkAggregateInScalarSubquery(outerAttrs, query, a)
case Filter(_, a: Aggregate) => checkAggregateInScalarSubquery(outerAttrs, query, a)
case p: LogicalPlan if p.maxRows.exists(_ <= 1) => // Ok
case other =>
expr.failAnalysis(

We should support the thrid case in splitSubquery to avoid Unexpected operator exceptions.

Why are the changes needed?

To fix an issue with correlated subquery rewrite.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New unit tests.

@github-actions github-actions bot added the SQL label Oct 21, 2022
@allisonwang-db allisonwang-db force-pushed the spark-40862-split-subquery branch from 815d3e1 to 0120a70 Compare October 24, 2022 20:16
@allisonwang-db
Copy link
Contributor Author

cc @cloud-fan

Copy link
Contributor

@jchen5 jchen5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, one small comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this comment needs to be updated - in the new case it's returning None, rather than the inner query block below HAVING (and this is ok because we only needed the aggregate to fix the COUNT bug). Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, we should at least explain when the third part can be None.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we make the case 1 result a lazy val? (multiple variable lazy val looks weird)

lazy val planWithoutCountBug = Project(... // Or just val as constructing logical plan is cheap
if (resultWithZeroTups.isEmpty) {
  planWithoutCountBug
} else {
  val (topPart, havingNode, aggNode) = splitSubquery(query)
  if (aggNode.isEmpty) planWithoutCountBug else ...
}

@allisonwang-db allisonwang-db force-pushed the spark-40862-split-subquery branch from f24ead1 to 8ea1954 Compare October 27, 2022 16:02
@allisonwang-db allisonwang-db force-pushed the spark-40862-split-subquery branch from 8ea1954 to 7f4cf74 Compare October 27, 2022 23:46
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 3feddec Oct 28, 2022
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…atedScalarSubquery

### What changes were proposed in this pull request?
This PR updates the `splitSubquery` in `RewriteCorrelatedScalarSubquery` to support non-aggregated one-row subquery.

In CheckAnalysis, we allow three types of correlated scalar subquery patterns:
1. SubqueryAlias/Project + Aggregate
2. SubqueryAlias/Project + Filter + Aggregate
3. SubqueryAlias/Project + LogicalPlan (maxRows <= 1)

https://github.com/apache/spark/blob/748fa2792e488a6b923b32e2898d9bb6e16fb4ca/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L851-L856

We should support the thrid case in `splitSubquery` to avoid `Unexpected operator` exceptions.

### Why are the changes needed?
To fix an issue with correlated subquery rewrite.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
New unit tests.

Closes apache#38336 from allisonwang-db/spark-40862-split-subquery.

Authored-by: allisonwang-db <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants