Skip to content

Conversation

@dilipbiswal
Copy link
Contributor

What changes were proposed in this pull request?

Currently we do a lot of validations for subquery in the Analyzer. We should move them to CheckAnalysis which is the framework to catch and report Analysis errors. This was mentioned as a review comment in SPARK-18874.

How was this patch tested?

Exists tests + A few tests added to SQLQueryTestSuite.

@SparkQA
Copy link

SparkQA commented Apr 21, 2017

Test build #76023 has finished for PR 17713 at commit 17eebd4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

cc @hvanhovell @gatorsmile

@SparkQA
Copy link

SparkQA commented Apr 21, 2017

Test build #76043 has finished for PR 17713 at commit 39e8cf7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just copy it from Analyzer.scala? Please leave some comments for saving the time of reviewers. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Thanks for pointing this. Yes.. Its basically copied from Analyzer.scala.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These negative test cases issue the exactly same errors? I mean, before this PR and after this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Actually there are a few minor difference in the errors that are issued. Thats the reason i decided to create the new tests for them to capture these new errors. Here are the difference.

  1. In Subquery.
    When the the number of columns in left hand and right hand sides of in subquery didn't match we raised the error from here in Analyzer. This handled both In and Scalar subquery.
    Now we capture it in checkInputTypes here and return a slightly clearer error message to the user.

  2. Scalar Subquery
    The number of output column > 1 condition was being handled in two places, one in Analyzer here that dealt with correlated scalar subquery and one in checkAnalysis here dealt with non-correlated scalar subquery. This is now consolidated at a single place here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to reviewers: This function basically refactors the validation logic for subquery expressions from checkAnalysis. This is the entry point function to do all the validation for subquery is is called from checkAnalysis().

@dilipbiswal
Copy link
Contributor Author

ping @gatorsmile @hvanhovell

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expressions is not needed, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Yeah.. its not needed. Fixed. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we output what are left sides and what are right sides? When we have nested subqueries in a very complex user query, it could be hard for users to locate it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile I have made changes. Can you please check and let me know if it looks okay to you ? I have also added the subquery expression id for better correlation. Please let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Same as above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update it and make it more clear? Basically, we are resolving the RHS plans in the subquery and populating the children using outer references, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Have made changes. Pl. let me know if it looks ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkAggregate is only for Scalar subqueries. Maybe you can rename it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Renamed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here. It is only for Scalar subqueries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile renamed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to making it longer but clearer. inOrExistsSubquery -> inSubqueryOrExistsSubquery

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile okay. changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Predicate sub-queries -> IN/EXISTS predicate sub-queries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s" -> "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must be Aggregated -> must be aggregated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Indent issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert it back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this checking logics in this function is invoked only when sub is resolved. Do we still need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Yeah.. i had thought about it as well. I have moved this now after checkAnalysis on the sub plan.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: foundNonEqualCorrelatedPred : Boolean -> foundNonEqualCorrelatedPred: Boolean

@gatorsmile
Copy link
Member

It looks pretty good. Just left some comments. Thanks!

@dilipbiswal
Copy link
Contributor Author

@gatorsmile Thanks a lot. Have addressed your comments. Please check when you get a chance.

@SparkQA
Copy link

SparkQA commented May 7, 2017

Test build #76544 has finished for PR 17713 at commit 8314954.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 7, 2017

Test build #76547 has finished for PR 17713 at commit 8314954.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subquery expression id: #${exprId.id} does not help to the end users, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile OK.. since we will have the plan information available for these analysis error, i was thinking it was possible to co-relate the error with the originating subquery expression. Let me remove it, given you think it may not be useful to the end-users as they may not be familiar with the system generated expression id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about cleanQueryInScalarSubquery?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile
Copy link
Member

LGTM except minor comments.

cc @hvanhovell

@SparkQA
Copy link

SparkQA commented May 8, 2017

Test build #76553 has finished for PR 17713 at commit 3c4f38e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the PR: #17930

It should be moved earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile Thanks for letting me know. I will change.

@dilipbiswal dilipbiswal force-pushed the subquery_checkanalysis branch from 3c4f38e to 9ef29c7 Compare May 11, 2017 00:29
@SparkQA
Copy link

SparkQA commented May 11, 2017

Test build #76761 has finished for PR 17713 at commit 9ef29c7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 11, 2017

Test build #76789 has finished for PR 17713 at commit 9ef29c7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

ping @hvanhovell

@gatorsmile
Copy link
Member

@dilipbiswal Could you resolve the conflict?

@dilipbiswal dilipbiswal force-pushed the subquery_checkanalysis branch from 9ef29c7 to c2a7555 Compare June 22, 2017 01:30
@dilipbiswal
Copy link
Contributor Author

@gatorsmile Just rebased. Thanks !!

@SparkQA
Copy link

SparkQA commented Jun 22, 2017

Test build #78418 has finished for PR 17713 at commit c2a7555.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

|[${valExprs.map(_.dataType.catalogString).mkString(", ")}].
|Right side:
|[${sub.output.map(_.dataType.catalogString).mkString(", ")}].
|The number of columns the left hand side of an IN subquery does not match the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of columns the left -> The number of columns in the left

override def checkInputDataTypes(): TypeCheckResult = {
list match {
case ListQuery(sub, _, _) :: Nil =>
case ListQuery(sub, _, exprId) :: Nil =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert it back?

@gatorsmile
Copy link
Member

LGTM except two minor comments.

@SparkQA
Copy link

SparkQA commented Jun 23, 2017

Test build #78512 has finished for PR 17713 at commit b91aea2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in 13c2a4f Jun 23, 2017
@dilipbiswal
Copy link
Contributor Author

@gatorsmile Thank you very much !!

robert3005 pushed a commit to palantir/spark that referenced this pull request Jun 29, 2017
… Analyzer

## What changes were proposed in this pull request?
Currently we do a lot of validations for subquery in the Analyzer. We should move them to CheckAnalysis which is the framework to catch and report Analysis errors. This was mentioned as a review comment in SPARK-18874.

## How was this patch tested?
Exists tests + A few tests added to SQLQueryTestSuite.

Author: Dilip Biswal <[email protected]>

Closes apache#17713 from dilipbiswal/subquery_checkanalysis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants