[SPARK-30782][SQL] Column resolution doesn't respect current catalog/namespace for v2 tables. #27532

imback82 · 2020-02-10T22:11:21Z

What changes were proposed in this pull request?

This PR proposes to fix an issue where qualified columns are not matched for v2 tables if current catalog/namespace are used.

For v1 tables, you can currently perform the following:

SELECT default.t.id FROM t;

For v2 tables, the following fails:

USE testcat.ns1.ns2;
SELECT testcat.ns1.ns2.t.id FROM t;

org.apache.spark.sql.AnalysisException: cannot resolve '`testcat.ns1.ns2.t.id`' given input columns: [t.id, t.point]; line 1 pos 7;

Why are the changes needed?

It is a bug since qualified column names cannot match if current catalog/namespace are used.

Does this PR introduce any user-facing change?

Yes, now the following works:

USE testcat.ns1.ns2;
SELECT testcat.ns1.ns2.t.id FROM t;

How was this patch tested?

Added new tests

imback82 · 2020-02-11T01:39:36Z

cc @cloud-fan @brkyvz

SparkQA · 2020-02-11T03:06:55Z

Test build #118191 has finished for PR 27532 at commit 10b19d9.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-02-11T03:09:37Z

Test build #118183 has finished for PR 27532 at commit 5706a3e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-11T09:10:36Z

@imback82 can you try some other databases like hive, presto, sql server? SELECT default.t.id FROM t looks a little counterintuitive to me.

imback82 · 2020-02-11T18:21:42Z

I tried postgres and mysql, and they allow this syntax.

postgres:

postgres=# create schema s1;
CREATE SCHEMA

postgres=# create table s1.t (i int);
CREATE TABLE

postgres=# SET search_path TO s1;
SET

postgres=# select t.i from t;
 i 
---
(0 rows)

postgres=# select s1.t.i from t;
 i 
---
(0 rows)

mysql:

mysql> create database test
    -> ;
Query OK, 1 row affected (0.00 sec)

mysql> use test;
Database changed

mysql> create table t (i int);
Query OK, 0 rows affected (0.00 sec)

mysql> select t.i from t;
Empty set (0.00 sec)

mysql> select test.t.i from t;
Empty set (0.00 sec)

imback82 · 2020-02-12T00:32:15Z

In contrast, Hive doesn't like the syntax

%jdbc(hive)
select tbl.t8 from default.tbl;
# this works

%jdbc(hive)
select default.tbl.t8 from default.tbl;
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or column reference 'default': (possible column names are: t8, t9)

I think we need to at lease make the behavior consistent b/w v1 and v2 tables.

cloud-fan · 2020-02-12T06:27:29Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+              // not belong to any namespaces. For v1 tables, namespace is resolved in
+              // `SessionCatalog.getRelation`.
+              val ns = if (ident.namespace.isEmpty) {
+                catalogManager.currentNamespace


What if the namespace is really []?

I'm fixing it in #27550

Good catch! I will take a look at the PR. Thanks!

This is still problematic even after #27550 because we allow spark_catalog.t. It will be resolved to an identifier with an empty namespace in CatalogAndIdentifier whereas v1SessionCatalog always uses current database if the given namespace (database) is empty. Should I just go ahead and disallow spark_catalog.t? What do you think? We briefly discussed this issue here: #27550 (comment)

yea let's do it!

@cloud-fan this is now fixed without this hack.

SparkQA · 2020-02-25T21:38:18Z

Test build #118927 has finished for PR 27532 at commit 6660d5c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-26T02:59:33Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

            case table =>
              SubqueryAlias(
-                identifier,
+                ident.asMultipartIdentifier,


shall we add catalog name too? to support cases like select spark_catalog.default.t.i from t.

We could, but it would not be consistent with v1 table behavior. I was thinking about adding this support when I update the resolution rule for session catalogs: #27391 (comment). What do you think, should I do it now?

ah ok. Let's leave it for now.

cloud-fan · 2020-02-26T05:17:16Z

sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala

+    Seq(true, false).foreach { useV1Table =>
+      val format = if (useV1Table) "json" else v2Format
+      if (useV1Table) {
+        spark.conf.unset(V2_SESSION_CATALOG_IMPLEMENTATION.key)


shall we keep this comment: // unset this config to use the default v2 session catalog.

cloud-fan · 2020-02-26T05:17:44Z

sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala

+        checkAnswer(sql("select t.i from spark_catalog.default.t"), Row(1))
+        checkAnswer(sql("select default.t.i from spark_catalog.default.t"), Row(1))
+
+        // catalog name cannot be used for v1 tables.


v1 tables -> tables in the session catalog, as we are testing both v1 and v2 tables here.

Thanks, updated.

SparkQA · 2020-02-26T08:05:01Z

Test build #118947 has finished for PR 27532 at commit f23ded6.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-26T10:56:53Z

retest this please

SparkQA · 2020-02-26T15:21:38Z

Test build #118970 has finished for PR 27532 at commit f23ded6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-26T16:21:53Z

thanks, merging to master/3.0!

cloud-fan · 2020-02-26T16:22:15Z

BTW, don't forget to remove the hack in TempViewOrV1Table :)

…namespace for v2 tables ### What changes were proposed in this pull request? This PR proposes to fix an issue where qualified columns are not matched for v2 tables if current catalog/namespace are used. For v1 tables, you can currently perform the following: ```SQL SELECT default.t.id FROM t; ``` For v2 tables, the following fails: ```SQL USE testcat.ns1.ns2; SELECT testcat.ns1.ns2.t.id FROM t; org.apache.spark.sql.AnalysisException: cannot resolve '`testcat.ns1.ns2.t.id`' given input columns: [t.id, t.point]; line 1 pos 7; ``` ### Why are the changes needed? It is a bug since qualified column names cannot match if current catalog/namespace are used. ### Does this PR introduce any user-facing change? Yes, now the following works: ```SQL USE testcat.ns1.ns2; SELECT testcat.ns1.ns2.t.id FROM t; ``` ### How was this patch tested? Added new tests Closes #27532 from imback82/qualifed_col_respect_current. Authored-by: Terry Kim <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 7330547) Signed-off-by: Wenchen Fan <[email protected]>

imback82 · 2020-02-26T18:23:11Z

Yes, I am working on it next!

…namespace for v2 tables ### What changes were proposed in this pull request? This PR proposes to fix an issue where qualified columns are not matched for v2 tables if current catalog/namespace are used. For v1 tables, you can currently perform the following: ```SQL SELECT default.t.id FROM t; ``` For v2 tables, the following fails: ```SQL USE testcat.ns1.ns2; SELECT testcat.ns1.ns2.t.id FROM t; org.apache.spark.sql.AnalysisException: cannot resolve '`testcat.ns1.ns2.t.id`' given input columns: [t.id, t.point]; line 1 pos 7; ``` ### Why are the changes needed? It is a bug since qualified column names cannot match if current catalog/namespace are used. ### Does this PR introduce any user-facing change? Yes, now the following works: ```SQL USE testcat.ns1.ns2; SELECT testcat.ns1.ns2.t.id FROM t; ``` ### How was this patch tested? Added new tests Closes apache#27532 from imback82/qualifed_col_respect_current. Authored-by: Terry Kim <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

imback82 added 2 commits February 10, 2020 14:05

initial commit

5706a3e

fix for v1 tables

10b19d9

cloud-fan reviewed Feb 12, 2020

View reviewed changes

imback82 added 4 commits February 16, 2020 21:58

Merge branch 'master' into qualifed_col_respect_current

13baa18

Merge branch 'master' into qualifed_col_respect_current

44d43e5

Merge branch 'master' into qualifed_col_respect_current

9a0cd68

remove hack

6660d5c

cloud-fan reviewed Feb 26, 2020

View reviewed changes

cloud-fan approved these changes Feb 26, 2020

View reviewed changes

address PR comments

f23ded6

cloud-fan closed this in 7330547 Feb 26, 2020

[SPARK-30782][SQL] Column resolution doesn't respect current catalog/namespace for v2 tables. #27532

[SPARK-30782][SQL] Column resolution doesn't respect current catalog/namespace for v2 tables. #27532

Uh oh!

Conversation

imback82 commented Feb 10, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

imback82 commented Feb 11, 2020

Uh oh!

SparkQA commented Feb 11, 2020

Uh oh!

SparkQA commented Feb 11, 2020

Uh oh!

cloud-fan commented Feb 11, 2020

Uh oh!

imback82 commented Feb 11, 2020

Uh oh!

imback82 commented Feb 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 25, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imback82 Feb 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 26, 2020

Uh oh!

cloud-fan commented Feb 26, 2020

Uh oh!

SparkQA commented Feb 26, 2020

Uh oh!

cloud-fan commented Feb 26, 2020

Uh oh!

cloud-fan commented Feb 26, 2020

Uh oh!

imback82 commented Feb 26, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

imback82 commented Feb 12, 2020 •

edited

Loading

imback82 Feb 26, 2020 •

edited

Loading