-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40862][SQL] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery #38336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40862][SQL] Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery #38336
Conversation
815d3e1 to
0120a70
Compare
|
cc @cloud-fan |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
Outdated
Show resolved
Hide resolved
jchen5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, one small comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this comment needs to be updated - in the new case it's returning None, rather than the inner query block below HAVING (and this is ok because we only needed the aggregate to fix the COUNT bug). Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we should at least explain when the third part can be None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we make the case 1 result a lazy val? (multiple variable lazy val looks weird)
lazy val planWithoutCountBug = Project(... // Or just val as constructing logical plan is cheap
if (resultWithZeroTups.isEmpty) {
planWithoutCountBug
} else {
val (topPart, havingNode, aggNode) = splitSubquery(query)
if (aggNode.isEmpty) planWithoutCountBug else ...
}
f24ead1 to
8ea1954
Compare
8ea1954 to
7f4cf74
Compare
|
thanks, merging to master! |
…atedScalarSubquery ### What changes were proposed in this pull request? This PR updates the `splitSubquery` in `RewriteCorrelatedScalarSubquery` to support non-aggregated one-row subquery. In CheckAnalysis, we allow three types of correlated scalar subquery patterns: 1. SubqueryAlias/Project + Aggregate 2. SubqueryAlias/Project + Filter + Aggregate 3. SubqueryAlias/Project + LogicalPlan (maxRows <= 1) https://github.com/apache/spark/blob/748fa2792e488a6b923b32e2898d9bb6e16fb4ca/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L851-L856 We should support the thrid case in `splitSubquery` to avoid `Unexpected operator` exceptions. ### Why are the changes needed? To fix an issue with correlated subquery rewrite. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New unit tests. Closes apache#38336 from allisonwang-db/spark-40862-split-subquery. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR updates the
splitSubqueryinRewriteCorrelatedScalarSubqueryto support non-aggregated one-row subquery.In CheckAnalysis, we allow three types of correlated scalar subquery patterns:
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
Lines 851 to 856 in 748fa27
We should support the thrid case in
splitSubqueryto avoidUnexpected operatorexceptions.Why are the changes needed?
To fix an issue with correlated subquery rewrite.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New unit tests.