Skip to content

Conversation

@skambha
Copy link
Contributor

@skambha skambha commented Feb 25, 2017

What changes were proposed in this pull request?

  • Add tests covering different scenarios with qualified column names
  • Please see Section 2 in the design doc for the various test scenarios here
  • As part of SPARK-19602, changes are made to support three part column name. In order to aid in the review and to reduce the diff, the test scenarios are separated out into this PR.

How was this patch tested?

  • This is a test only change. The individual test suites were run successfully.

@gatorsmile
Copy link
Member

ok to test

@gatorsmile
Copy link
Member

Some test cases are negative; some test cases are expected to be supported. Could you split them from the positive test cases? Now some test cases are pretty large.

@SparkQA
Copy link

SparkQA commented Feb 26, 2017

Test build #73479 has started for PR 17067 at commit 2f9937e.

@gatorsmile
Copy link
Member

Uh. Forgot one more point.

Could you move these test cases to our new end-to-end SQL query suite into SQLQueryTestSuite ?

@gatorsmile
Copy link
Member

Thank you for working on this! I like the coverage of qualified column names.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 26, 2017

Test build #73494 has finished for PR 17067 at commit 2f9937e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

gatorsmile commented Feb 28, 2017

We can move the remaining test cases to SQLQueryTestSuite , right? No need to put all of them in the same file.

@skambha
Copy link
Contributor Author

skambha commented Feb 28, 2017

Thanks much Xiao for the review and comments.

I have made the following changes:

  • Separated out the -ve cases from the +ve cases.
  • Moved positive tests and also the cases that should be supported into the SQLQueryTestSuite framework. A new test file columnresolution.sql and the corresponding master out file is added.
  • Clean up the ColumnResolutionSuite to remove cases that are covered in the SQLQueryTestSuite
  • I have kept the -ve cases in the ColumnResolutionSuite because the exprId shows up in the exception.
  • I also wanted to cover a case against a hive serde table so I have kept those tests in the ColumnResolutionSuite

Please advise if we should move any others. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about these test cases for temporary views?

-- Test data.
CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, null)
AS testData(a, b);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me look at converting these too. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also doable for global temporary view, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Xiao, I have moved my new local temp view tests and the global temp view tests to the SQLQueryTestSuite framework as well. Please take a look. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the test cases you want to keep here, you can move it to sql/core. Why we need to test hive serde tables? Compared with data source tables, it is touching different code paths to resolve columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to resolve the column in the LogicalPlan is same - there is no change there. I wanted to test the hive table to make sure that the qualifier information is correctly set. We update the qualifier info in MetastoreRelation so wanted to have coverage for hive table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use upper case for SQL keywords.

@SparkQA
Copy link

SparkQA commented Feb 28, 2017

Test build #73561 has finished for PR 17067 at commit e4f347e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skambha skambha force-pushed the colResolutionTests branch from e4f347e to 1ae20ec Compare March 1, 2017 00:02
@SparkQA
Copy link

SparkQA commented Mar 1, 2017

Test build #73622 has finished for PR 17067 at commit 1ae20ec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skambha skambha force-pushed the colResolutionTests branch from 1ae20ec to 5594eb0 Compare March 2, 2017 00:28
@skambha
Copy link
Contributor Author

skambha commented Mar 2, 2017

  • Changes to the SQLQueryTestSuite framework to mask the exprId so I can add the -ve cases as well using this framework.
  • Added -ve test cases to the SQLQueryTestSuite framework and so removed the hive specific test suite. For the hive table testcase, I will add that test as part of the actual code changes PR.
  • I synced up the codeline and there was one test output inner-join.sql.out that needed a comment to be updated, so I have updated that as well.

@SparkQA
Copy link

SparkQA commented Mar 2, 2017

Test build #73722 has finished for PR 17067 at commit 5594eb0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

-- reset
set spark.sql.crossJoin.enabled = false;
DROP DATABASE mydb1 CASCADE;
DROP DATABASE mydb2 CASCADE; No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add one empty space after this line

SELECT mydb1.t1.i1 FROM t1;

-- reset
set spark.sql.crossJoin.enabled = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to line 24

@@ -0,0 +1,25 @@
-- Tests for qualified column names for the view code-path
-- Test scenario with Temporary view
CREATE OR REPLACE TEMPORARY VIEW table1 AS SELECT 2 AS i1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table1 -> view1

DROP VIEW table1;

-- Test scenario with Global Temp view
CREATE OR REPLACE GLOBAL TEMPORARY VIEW t1 as SELECT 1 as i1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t1 -> view1

val msg = if (a.plan.nonEmpty) a.getSimpleMessage else a.getMessage
(StructType(Seq.empty),
Seq(a.getClass.getName, a.getSimpleMessage.replaceAll("#\\d+", "#x")))
Seq(a.getClass.getName, msg.replaceAll("#\\d+", "#x")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: (StructType(Seq.empty), Seq(a.getClass.getName, msg.replaceAll("#\\d+", "#x")))

struct<>
-- !query 9 output
org.apache.spark.sql.AnalysisException
Reference 't1.i1' is ambiguous, could be: i1#x, i1#x.; line 1 pos 7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the other reviewer, this is the reason why we need to make a change in SQLQueryTestSuite.scala

INSERT INTO t5 VALUES(1, (2, 3));
SELECT t5.i1 FROM t5;
SELECT t5.t5.i1 FROM t5;
SELECT t5.t5.i1 FROM mydb1.t5;
Copy link
Member

@gatorsmile gatorsmile Mar 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add two more cases for verifying *

@gatorsmile
Copy link
Member

Generally, it looks good to me.

@SparkQA
Copy link

SparkQA commented Mar 2, 2017

Test build #73779 has finished for PR 17067 at commit b2e411c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in f37bb14 Mar 3, 2017
@skambha
Copy link
Contributor Author

skambha commented Mar 3, 2017

Thanks a lot Xiao.

asfgit pushed a commit that referenced this pull request Aug 7, 2018
…n name ( 3 part name)

## What changes were proposed in this pull request?
The design details is attached to the JIRA issue [here](https://drive.google.com/file/d/1zKm3aNZ3DpsqIuoMvRsf0kkDkXsAasxH/view)

High level overview of the changes are:
- Enhance the qualifier to be more than one string
- Add support to store the qualifier. Enhance the lookupRelation to keep the qualifier appropriately.
- Enhance the table matching column resolution algorithm to account for qualifier being more than a string.
- Enhance the table matching algorithm in UnresolvedStar.expand
- Ensure that we continue to support select t1.i1 from db1.t1

## How was this patch tested?
- New tests are added.
- Several test scenarios were added in a separate  [test pr 17067](#17067).  The tests that were not supported earlier are marked with TODO markers and those are now supported with the code changes here.
- Existing unit tests ( hive, catalyst and sql) were run successfully.

Closes #17185 from skambha/colResolution.

Authored-by: Sunitha Kambhampati <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants