-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20417][SQL] Move subquery error handling to checkAnalysis from Analyzer #17713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #76023 has finished for PR 17713 at commit
|
|
Test build #76043 has finished for PR 17713 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just copy it from Analyzer.scala? Please leave some comments for saving the time of reviewers. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Thanks for pointing this. Yes.. Its basically copied from Analyzer.scala.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These negative test cases issue the exactly same errors? I mean, before this PR and after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Actually there are a few minor difference in the errors that are issued. Thats the reason i decided to create the new tests for them to capture these new errors. Here are the difference.
-
In Subquery.
When the the number of columns in left hand and right hand sides of in subquery didn't match we raised the error from here in Analyzer. This handled both In and Scalar subquery.
Now we capture it in checkInputTypes here and return a slightly clearer error message to the user. -
Scalar Subquery
The number of output column > 1 condition was being handled in two places, one in Analyzer here that dealt with correlated scalar subquery and one in checkAnalysis here dealt with non-correlated scalar subquery. This is now consolidated at a single place here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to reviewers: This function basically refactors the validation logic for subquery expressions from checkAnalysis. This is the entry point function to do all the validation for subquery is is called from checkAnalysis().
|
ping @gatorsmile @hvanhovell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expressions is not needed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Yeah.. its not needed. Fixed. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we output what are left sides and what are right sides? When we have nested subqueries in a very complex user query, it could be hard for users to locate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile I have made changes. Can you please check and let me know if it looks okay to you ? I have also added the subquery expression id for better correlation. Please let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update it and make it more clear? Basically, we are resolving the RHS plans in the subquery and populating the children using outer references, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Have made changes. Pl. let me know if it looks ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkAggregate is only for Scalar subqueries. Maybe you can rename it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Renamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here. It is only for Scalar subqueries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile renamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to making it longer but clearer. inOrExistsSubquery -> inSubqueryOrExistsSubquery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile okay. changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Predicate sub-queries -> IN/EXISTS predicate sub-queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s" -> "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must be Aggregated -> must be aggregated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Indent issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this checking logics in this function is invoked only when sub is resolved. Do we still need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Yeah.. i had thought about it as well. I have moved this now after checkAnalysis on the sub plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: foundNonEqualCorrelatedPred : Boolean -> foundNonEqualCorrelatedPred: Boolean
|
It looks pretty good. Just left some comments. Thanks! |
|
@gatorsmile Thanks a lot. Have addressed your comments. Please check when you get a chance. |
|
Test build #76544 has finished for PR 17713 at commit
|
|
retest this please |
|
Test build #76547 has finished for PR 17713 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subquery expression id: #${exprId.id} does not help to the end users, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile OK.. since we will have the plan information available for these analysis error, i was thinking it was possible to co-relate the error with the originating subquery expression. Let me remove it, given you think it may not be useful to the end-users as they may not be familiar with the system generated expression id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about cleanQueryInScalarSubquery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile OK.
|
LGTM except minor comments. cc @hvanhovell |
|
Test build #76553 has finished for PR 17713 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the PR: #17930
It should be moved earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Thanks for letting me know. I will change.
3c4f38e to
9ef29c7
Compare
|
Test build #76761 has finished for PR 17713 at commit
|
|
retest this please |
|
Test build #76789 has finished for PR 17713 at commit
|
|
ping @hvanhovell |
|
@dilipbiswal Could you resolve the conflict? |
9ef29c7 to
c2a7555
Compare
|
@gatorsmile Just rebased. Thanks !! |
|
Test build #78418 has finished for PR 17713 at commit
|
| |[${valExprs.map(_.dataType.catalogString).mkString(", ")}]. | ||
| |Right side: | ||
| |[${sub.output.map(_.dataType.catalogString).mkString(", ")}]. | ||
| |The number of columns the left hand side of an IN subquery does not match the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of columns the left -> The number of columns in the left
| override def checkInputDataTypes(): TypeCheckResult = { | ||
| list match { | ||
| case ListQuery(sub, _, _) :: Nil => | ||
| case ListQuery(sub, _, exprId) :: Nil => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert it back?
|
LGTM except two minor comments. |
|
Test build #78512 has finished for PR 17713 at commit
|
|
Thanks! Merging to master. |
|
@gatorsmile Thank you very much !! |
… Analyzer ## What changes were proposed in this pull request? Currently we do a lot of validations for subquery in the Analyzer. We should move them to CheckAnalysis which is the framework to catch and report Analysis errors. This was mentioned as a review comment in SPARK-18874. ## How was this patch tested? Exists tests + A few tests added to SQLQueryTestSuite. Author: Dilip Biswal <[email protected]> Closes apache#17713 from dilipbiswal/subquery_checkanalysis.
What changes were proposed in this pull request?
Currently we do a lot of validations for subquery in the Analyzer. We should move them to CheckAnalysis which is the framework to catch and report Analysis errors. This was mentioned as a review comment in SPARK-18874.
How was this patch tested?
Exists tests + A few tests added to SQLQueryTestSuite.